TinyLlama.cpp 1.0
A lightweight C++ implementation of the TinyLlama language model
Loading...
Searching...
No Matches
Public Attributes | List of all members
block_q3_K Struct Reference

3-bit K-quantized block structure More...

#include <quantization.h>

Collaboration diagram for block_q3_K:
Collaboration graph

Public Attributes

uint8_t hmask [GGML_QK_K/8]
 
uint8_t qs [GGML_QK_K/4]
 
uint8_t scales [12]
 
uint16_t d
 
uint16_t dmin
 

Detailed Description

3-bit K-quantized block structure

Stores weights quantized to 3 bits with block-wise scaling. Balances compression and precision between Q2_K and Q4_K.

Definition at line 99 of file quantization.h.

Member Data Documentation

◆ d

uint16_t block_q3_K::d

Block scale

Definition at line 103 of file quantization.h.

Referenced by dequantize_q3_k().

◆ dmin

uint16_t block_q3_K::dmin

Block minimum value

Definition at line 104 of file quantization.h.

Referenced by dequantize_q3_k().

◆ hmask

uint8_t block_q3_K::hmask[GGML_QK_K/8]

High bit masks

Definition at line 100 of file quantization.h.

Referenced by dequantize_q3_k().

◆ qs

uint8_t block_q3_K::qs[GGML_QK_K/4]

Quantized values

Definition at line 101 of file quantization.h.

Referenced by dequantize_q3_k().

◆ scales

uint8_t block_q3_K::scales[12]

Sub-block scales

Definition at line 102 of file quantization.h.

Referenced by dequantize_q3_k().


The documentation for this struct was generated from the following file: