TinyLlama.cpp 1.0
A lightweight C++ implementation of the TinyLlama language model
Loading...
Searching...
No Matches
Public Attributes | List of all members
block_q4_K Struct Reference

4-bit K-quantized block structure More...

#include <quantization.h>

Collaboration diagram for block_q4_K:
Collaboration graph

Public Attributes

uint16_t d
 
uint16_t dmin
 
uint8_t scales [12]
 
uint8_t qs [GGML_QK_K/2]
 

Detailed Description

4-bit K-quantized block structure

Stores weights quantized to 4 bits with block-wise scaling. Each block contains 256 quantized values.

Definition at line 57 of file quantization.h.

Member Data Documentation

◆ d

uint16_t block_q4_K::d

Block scale

Definition at line 58 of file quantization.h.

Referenced by dequantize_q4_k_m(), and quantize_q4_k_m().

◆ dmin

uint16_t block_q4_K::dmin

Block minimum value

Definition at line 59 of file quantization.h.

Referenced by dequantize_q4_k_m(), and quantize_q4_k_m().

◆ qs

uint8_t block_q4_K::qs[GGML_QK_K/2]

Quantized values

Definition at line 61 of file quantization.h.

Referenced by dequantize_q4_k_m(), quantize_q4_k_m(), and vec_dot_q4_k_q8_k_cpu().

◆ scales

uint8_t block_q4_K::scales[12]

Sub-block scales

Definition at line 60 of file quantization.h.

Referenced by dequantize_q4_k_m(), and quantize_q4_k_m().


The documentation for this struct was generated from the following file: