|
TinyLlama.cpp 1.0
A lightweight C++ implementation of the TinyLlama language model
|
8-bit K-quantized block structure with block sums More...
#include <quantization.h>

Public Attributes | |
| uint16_t | d |
| int8_t | qs [GGML_QK_K] |
| int16_t | bsums [GGML_QK_K/16] |
8-bit K-quantized block structure with block sums
Definition at line 111 of file quantization.h.
| int16_t block_q8_K::bsums[GGML_QK_K/16] |
Block sums for fast dot product
Definition at line 114 of file quantization.h.
Referenced by vec_dot_q6_k_q8_k_cpu().
| uint16_t block_q8_K::d |
Block scale
Definition at line 112 of file quantization.h.
Referenced by dequantize_q8_k(), and quantize_fp32_to_q8_K().
| int8_t block_q8_K::qs[GGML_QK_K] |
Quantized values
Definition at line 113 of file quantization.h.
Referenced by dequantize_q8_k(), quantize_fp32_to_q8_K(), vec_dot_q4_k_q8_k_cpu(), and vec_dot_q6_k_q8_k_cpu().