|
TinyLlama.cpp 1.0
A lightweight C++ implementation of the TinyLlama language model
|
3-bit K-quantized block structure More...
#include <quantization.h>

Public Attributes | |
| uint8_t | hmask [GGML_QK_K/8] |
| uint8_t | qs [GGML_QK_K/4] |
| uint8_t | scales [12] |
| uint16_t | d |
| uint16_t | dmin |
3-bit K-quantized block structure
Stores weights quantized to 3 bits with block-wise scaling. Balances compression and precision between Q2_K and Q4_K.
Definition at line 99 of file quantization.h.
| uint16_t block_q3_K::d |
| uint16_t block_q3_K::dmin |
| uint8_t block_q3_K::hmask[GGML_QK_K/8] |
| uint8_t block_q3_K::qs[GGML_QK_K/4] |
| uint8_t block_q3_K::scales[12] |