TinyLlama.cpp 1.0
A lightweight C++ implementation of the TinyLlama language model
Loading...
Searching...
No Matches
Public Attributes | List of all members
block_q6_K Struct Reference

6-bit K-quantized block structure More...

#include <quantization.h>

Collaboration diagram for block_q6_K:
Collaboration graph

Public Attributes

uint8_t ql [GGML_QK_K/2]
 
uint8_t qh [GGML_QK_K/4]
 
int8_t scales [GGML_QK_K/16]
 
uint16_t d
 

Detailed Description

6-bit K-quantized block structure

Stores weights quantized to 6 bits with block-wise scaling. Provides better precision than Q4_K at the cost of more storage.

Definition at line 71 of file quantization.h.

Member Data Documentation

◆ d

uint16_t block_q6_K::d

Block scale

Definition at line 75 of file quantization.h.

Referenced by dequantize_q6_k(), quantize_q6_k(), and vec_dot_q6_k_q8_k_cpu().

◆ qh

uint8_t block_q6_K::qh[GGML_QK_K/4]

Upper 2 bits of quantized values

Definition at line 73 of file quantization.h.

Referenced by dequantize_q6_k(), quantize_q6_k(), and vec_dot_q6_k_q8_k_cpu().

◆ ql

uint8_t block_q6_K::ql[GGML_QK_K/2]

Lower 4 bits of quantized values

Definition at line 72 of file quantization.h.

Referenced by dequantize_q6_k(), quantize_q6_k(), and vec_dot_q6_k_q8_k_cpu().

◆ scales

int8_t block_q6_K::scales[GGML_QK_K/16]

Sub-block scales

Definition at line 74 of file quantization.h.

Referenced by dequantize_q6_k(), quantize_q6_k(), and vec_dot_q6_k_q8_k_cpu().


The documentation for this struct was generated from the following file: