TinyLlama.cpp 1.0
A lightweight C++ implementation of the TinyLlama language model
Loading...
Searching...
No Matches
Public Types | Public Attributes | List of all members
ModelConfig Struct Reference

Model configuration structure holding architecture and hyperparameters. More...

#include <model.h>

Collaboration diagram for ModelConfig:
Collaboration graph

Public Types

enum class  TokenizerFamily { UNKNOWN , LLAMA_SENTENCEPIECE , LLAMA3_TIKTOKEN }
 

Public Attributes

int hidden_size
 
int intermediate_size
 
int num_attention_heads
 
int num_key_value_heads
 
int num_hidden_layers
 
int vocab_size
 
int max_position_embeddings
 
float rms_norm_eps
 
float rope_theta
 
std::string hidden_act
 
std::string torch_dtype
 
int bos_token_id
 
int eos_token_id
 
int unk_token_id = -1
 
int pad_token_id = -1
 
std::string architecture
 
std::string model_name
 
std::string chat_template_type
 
std::string pre_tokenizer_type
 
std::string chat_template_string
 
bool is_gguf_file_loaded
 
bool use_mmap_for_gguf = true
 
bool use_kvcache_quantization = false
 
int num_cpu_offload_layers = 0
 
bool enable_memory_efficient_layers = true
 
bool enable_prefill_chunking = true
 
bool use_optimized_cuda_kernels = true
 
TokenizerFamily tokenizer_family = TokenizerFamily::UNKNOWN
 

Detailed Description

Model configuration structure holding architecture and hyperparameters.

Contains all key parameters needed to construct and run a transformer model, including hidden size, number of layers, attention heads, vocabulary size, special token IDs, etc.

Definition at line 80 of file model.h.

Member Enumeration Documentation

◆ TokenizerFamily

enum class ModelConfig::TokenizerFamily
strong
Enumerator
UNKNOWN 
LLAMA_SENTENCEPIECE 
LLAMA3_TIKTOKEN 

Definition at line 112 of file model.h.

112 {
113 UNKNOWN,
114 LLAMA_SENTENCEPIECE, // For Llama 2 and similar SentencePiece BPE
115 LLAMA3_TIKTOKEN // For Llama 3's Tiktoken-based BPE
116 };

Member Data Documentation

◆ architecture

std::string ModelConfig::architecture

◆ bos_token_id

int ModelConfig::bos_token_id

◆ chat_template_string

std::string ModelConfig::chat_template_string

Template string for chat formatting

Definition at line 100 of file model.h.

Referenced by parse_model_config_from_gguf(), and PYBIND11_MODULE().

◆ chat_template_type

std::string ModelConfig::chat_template_type

Type of chat template used

Definition at line 98 of file model.h.

Referenced by parse_model_config_from_gguf(), and PYBIND11_MODULE().

◆ enable_memory_efficient_layers

bool ModelConfig::enable_memory_efficient_layers = true

Enable automatic layer weight eviction during forward pass

Definition at line 107 of file model.h.

Referenced by TinyLlamaModel::forward().

◆ enable_prefill_chunking

bool ModelConfig::enable_prefill_chunking = true

Definition at line 109 of file model.h.

◆ eos_token_id

int ModelConfig::eos_token_id

◆ hidden_act

std::string ModelConfig::hidden_act

Activation function in hidden layers

Definition at line 90 of file model.h.

Referenced by parse_model_config(), parse_model_config_from_gguf(), and PYBIND11_MODULE().

◆ hidden_size

int ModelConfig::hidden_size

◆ intermediate_size

int ModelConfig::intermediate_size

◆ is_gguf_file_loaded

bool ModelConfig::is_gguf_file_loaded

◆ max_position_embeddings

int ModelConfig::max_position_embeddings

◆ model_name

std::string ModelConfig::model_name

◆ num_attention_heads

int ModelConfig::num_attention_heads

◆ num_cpu_offload_layers

int ModelConfig::num_cpu_offload_layers = 0

◆ num_hidden_layers

int ModelConfig::num_hidden_layers

◆ num_key_value_heads

int ModelConfig::num_key_value_heads

◆ pad_token_id

int ModelConfig::pad_token_id = -1

Padding token ID, default to -1 if not specified

Definition at line 95 of file model.h.

Referenced by SafeTensorsLoader::load_model_config_from_json(), parse_model_config(), parse_model_config_from_gguf(), and Tokenizer::Tokenizer().

◆ pre_tokenizer_type

std::string ModelConfig::pre_tokenizer_type

Type of pre-tokenizer

Definition at line 99 of file model.h.

Referenced by parse_model_config_from_gguf(), and PYBIND11_MODULE().

◆ rms_norm_eps

float ModelConfig::rms_norm_eps

◆ rope_theta

float ModelConfig::rope_theta

◆ tokenizer_family

TokenizerFamily ModelConfig::tokenizer_family = TokenizerFamily::UNKNOWN

◆ torch_dtype

std::string ModelConfig::torch_dtype

Data type used in the original PyTorch model

Definition at line 91 of file model.h.

Referenced by parse_model_config(), and PYBIND11_MODULE().

◆ unk_token_id

int ModelConfig::unk_token_id = -1

Unknown token ID, default to -1 if not specified

Definition at line 94 of file model.h.

Referenced by SafeTensorsLoader::load_model_config_from_json(), parse_model_config(), parse_model_config_from_gguf(), and Tokenizer::Tokenizer().

◆ use_kvcache_quantization

bool ModelConfig::use_kvcache_quantization = false

Whether to use INT8 quantization for KVCache on GPU

Definition at line 103 of file model.h.

Referenced by KVCache::initialize(), and tinyllama::TinyLlamaSession::TinyLlamaSession().

◆ use_mmap_for_gguf

bool ModelConfig::use_mmap_for_gguf = true

◆ use_optimized_cuda_kernels

bool ModelConfig::use_optimized_cuda_kernels = true

Definition at line 110 of file model.h.

◆ vocab_size

int ModelConfig::vocab_size

The documentation for this struct was generated from the following file: