TinyLlama.cpp 1.0
A lightweight C++ implementation of the TinyLlama language model
Loading...
Searching...
No Matches
Public Attributes | List of all members
block_q2_K Struct Reference

2-bit K-quantized block structure More...

#include <quantization.h>

Collaboration diagram for block_q2_K:
Collaboration graph

Public Attributes

uint16_t d
 
uint16_t dmin
 
uint8_t scales [GGML_QK_K/16]
 
uint8_t qs [GGML_QK_K/4]
 

Detailed Description

2-bit K-quantized block structure

Stores weights quantized to 2 bits with block-wise scaling. Provides maximum compression at the cost of precision.

Definition at line 85 of file quantization.h.

Member Data Documentation

◆ d

uint16_t block_q2_K::d

Block scale

Definition at line 86 of file quantization.h.

Referenced by dequantize_q2_k().

◆ dmin

uint16_t block_q2_K::dmin

Block minimum value

Definition at line 87 of file quantization.h.

Referenced by dequantize_q2_k().

◆ qs

uint8_t block_q2_K::qs[GGML_QK_K/4]

Quantized values

Definition at line 89 of file quantization.h.

Referenced by dequantize_q2_k().

◆ scales

uint8_t block_q2_K::scales[GGML_QK_K/16]

Sub-block scales

Definition at line 88 of file quantization.h.

Referenced by dequantize_q2_k().


The documentation for this struct was generated from the following file: