|
TinyLlama.cpp 1.0
A lightweight C++ implementation of the TinyLlama language model
|
6-bit K-quantized block structure More...
#include <quantization.h>

Public Attributes | |
| uint8_t | ql [GGML_QK_K/2] |
| uint8_t | qh [GGML_QK_K/4] |
| int8_t | scales [GGML_QK_K/16] |
| uint16_t | d |
6-bit K-quantized block structure
Stores weights quantized to 6 bits with block-wise scaling. Provides better precision than Q4_K at the cost of more storage.
Definition at line 71 of file quantization.h.
| uint16_t block_q6_K::d |
Block scale
Definition at line 75 of file quantization.h.
Referenced by dequantize_q6_k(), quantize_q6_k(), and vec_dot_q6_k_q8_k_cpu().
| uint8_t block_q6_K::qh[GGML_QK_K/4] |
Upper 2 bits of quantized values
Definition at line 73 of file quantization.h.
Referenced by dequantize_q6_k(), quantize_q6_k(), and vec_dot_q6_k_q8_k_cpu().
| uint8_t block_q6_K::ql[GGML_QK_K/2] |
Lower 4 bits of quantized values
Definition at line 72 of file quantization.h.
Referenced by dequantize_q6_k(), quantize_q6_k(), and vec_dot_q6_k_q8_k_cpu().
| int8_t block_q6_K::scales[GGML_QK_K/16] |
Sub-block scales
Definition at line 74 of file quantization.h.
Referenced by dequantize_q6_k(), quantize_q6_k(), and vec_dot_q6_k_q8_k_cpu().