|
TinyLlama.cpp 1.0
A lightweight C++ implementation of the TinyLlama language model
|
4-bit K-quantized block structure More...
#include <quantization.h>

Public Attributes | |
| uint16_t | d |
| uint16_t | dmin |
| uint8_t | scales [12] |
| uint8_t | qs [GGML_QK_K/2] |
4-bit K-quantized block structure
Stores weights quantized to 4 bits with block-wise scaling. Each block contains 256 quantized values.
Definition at line 57 of file quantization.h.
| uint16_t block_q4_K::d |
Block scale
Definition at line 58 of file quantization.h.
Referenced by dequantize_q4_k_m(), and quantize_q4_k_m().
| uint16_t block_q4_K::dmin |
Block minimum value
Definition at line 59 of file quantization.h.
Referenced by dequantize_q4_k_m(), and quantize_q4_k_m().
| uint8_t block_q4_K::qs[GGML_QK_K/2] |
Quantized values
Definition at line 61 of file quantization.h.
Referenced by dequantize_q4_k_m(), quantize_q4_k_m(), and vec_dot_q4_k_q8_k_cpu().
| uint8_t block_q4_K::scales[12] |
Sub-block scales
Definition at line 60 of file quantization.h.
Referenced by dequantize_q4_k_m(), and quantize_q4_k_m().