TinyLlama.cpp 1.0
A lightweight C++ implementation of the TinyLlama language model
Loading...
Searching...
No Matches
Classes | Public Member Functions | Static Public Member Functions | Private Member Functions | Private Attributes | List of all members
SafeTensorsLoader Class Reference

Main class for loading tensors from SafeTensors format files (single or sharded) More...

#include <safetensors_loader.h>

Collaboration diagram for SafeTensorsLoader:
Collaboration graph

Classes

struct  TensorInfo
 Information about a tensor stored in the SafeTensors file(s) More...
 

Public Member Functions

 SafeTensorsLoader (const std::string &model_load_path)
 Constructs a SafeTensorsLoader.
 
 ~SafeTensorsLoader ()
 Destructor. Cleans up all memory-mapped shards.
 
 SafeTensorsLoader (const SafeTensorsLoader &)=delete
 
SafeTensorsLoaderoperator= (const SafeTensorsLoader &)=delete
 
std::vector< std::string > tensor_names () const
 Get a list of all tensor names available in the loaded model.
 
std::vector< uint8_t > get_tensor_bytes (const std::string &name) const
 Get the raw bytes for a tensor, converting to FP32 if needed.
 
const TensorInfoget_tensor_info (const std::string &name) const
 Get information about a specific tensor.
 
std::map< std::string, std::vector< uint8_t > > load_all_tensors_parallel () const
 Load all tensors in parallel.
 

Static Public Member Functions

static bool load_model_config_from_json (const std::string &model_path_or_dir, ModelConfig &config_to_populate)
 Loads model configuration from a JSON file corresponding to a .safetensors model path.
 

Private Member Functions

void load_from_directory (const std::string &directory_path)
 Load tensors from a directory, handling index files and multiple shards.
 
void load_single_file (const std::string &file_path, const std::string &shard_key_override="")
 Load a single .safetensors file as a shard.
 
void parse_shard_metadata (Shard &shard, const std::string &shard_key)
 Parse the metadata of a shard and populate tensor information.
 
std::vector< uint8_t > convert_tensor_data (const uint8_t *data, size_t size, const std::string &dtype) const
 Convert raw tensor data to FP32 if needed.
 
const Shardget_shard_for_tensor (const std::string &tensor_name) const
 Get the Shard object for a given tensor name.
 

Private Attributes

std::string model_load_path_
 
bool is_sharded_ = false
 
std::map< std::string, TensorInfotensors_
 
std::map< std::string, std::unique_ptr< Shard > > loaded_shards_
 
std::map< std::string, std::string > tensor_name_to_shard_key_map_
 

Detailed Description

Main class for loading tensors from SafeTensors format files (single or sharded)

Supports both single-file and multi-shard (sharded) SafeTensors models. Handles memory mapping, tensor metadata parsing, and provides efficient access to tensor data. Can load models from a single .safetensors file, a directory containing multiple shards, or a directory with an index file.

Definition at line 120 of file safetensors_loader.h.

Constructor & Destructor Documentation

◆ SafeTensorsLoader() [1/2]

SafeTensorsLoader::SafeTensorsLoader ( const std::string &  model_load_path)
explicit

Constructs a SafeTensorsLoader.

The path can be to a single .safetensors file, or a directory containing .safetensors file(s) and potentially an index.json.

Parameters
model_load_pathPath to the model file or directory.
Exceptions
std::runtime_errorif files cannot be opened, are invalid, or sharding info is inconsistent.

Definition at line 283 of file safetensors_loader.cpp.

284 : model_load_path_(model_load_path), is_sharded_(false) {
285 Logger::info("SafeTensorsLoader: Initializing for path: " + model_load_path_);
286 std::filesystem::path path_obj(model_load_path_);
287
288 if (!std::filesystem::exists(path_obj)){
289 throw std::runtime_error("SafeTensorsLoader: Provided model_load_path does not exist: " + model_load_path_);
290 }
291
292 if (std::filesystem::is_directory(path_obj)) {
293 Logger::info("SafeTensorsLoader: Path is a directory. Attempting to load from directory.");
295 } else if (std::filesystem::is_regular_file(path_obj)) {
296 Logger::info("SafeTensorsLoader: Path is a single file. Loading single file.");
297 std::string file_key = path_obj.filename().string();
299 is_sharded_ = false;
300 } else {
301 throw std::runtime_error("SafeTensorsLoader: model_load_path is not a valid file or directory: " + model_load_path_);
302 }
303
304 if (tensors_.empty() && loaded_shards_.empty()) {
305 Logger::warning("SafeTensorsLoader: Initialization complete, but no tensors were loaded and no shards mapped. Check model path and format: " + model_load_path_);
306 } else {
307 Logger::info("SafeTensorsLoader: Initialization complete. Total unique tensors mapped: " + std::to_string(tensors_.size()) +
308 " from " + std::to_string(loaded_shards_.size()) + " shard(s).");
309 }
310}
static void warning(const std::string &message)
Definition logger.cpp:139
static void info(const std::string &message)
Definition logger.cpp:135
std::map< std::string, std::unique_ptr< Shard > > loaded_shards_
std::map< std::string, TensorInfo > tensors_
void load_from_directory(const std::string &directory_path)
Load tensors from a directory, handling index files and multiple shards.
void load_single_file(const std::string &file_path, const std::string &shard_key_override="")
Load a single .safetensors file as a shard.

References Logger::info(), is_sharded_, load_from_directory(), load_single_file(), loaded_shards_, model_load_path_, tensors_, and Logger::warning().

◆ ~SafeTensorsLoader()

SafeTensorsLoader::~SafeTensorsLoader ( )

Destructor. Cleans up all memory-mapped shards.

Definition at line 312 of file safetensors_loader.cpp.

312 {
313 Logger::info("SafeTensorsLoader: Destructing. Clearing " + std::to_string(loaded_shards_.size()) + " loaded shards.");
314 loaded_shards_.clear();
315 Logger::info("SafeTensorsLoader: All shards cleared.");
316}

References Logger::info(), and loaded_shards_.

◆ SafeTensorsLoader() [2/2]

SafeTensorsLoader::SafeTensorsLoader ( const SafeTensorsLoader )
delete

Member Function Documentation

◆ convert_tensor_data()

std::vector< uint8_t > SafeTensorsLoader::convert_tensor_data ( const uint8_t *  data,
size_t  size,
const std::string &  dtype 
) const
private

Convert raw tensor data to FP32 if needed.

Handles conversion from F16/BF16 to FP32 as required by the tensor's dtype.

Parameters
dataPointer to the raw tensor data.
sizeSize of the data in bytes.
dtypeData type string (e.g., "F32", "F16", "BF16").
Returns
Converted tensor data as a vector of bytes (FP32 format).

Definition at line 580 of file safetensors_loader.cpp.

580 {
581 if (dtype_str_upper == "F32") {
582 return std::vector<uint8_t>(data_ptr, data_ptr + n_bytes);
583 } else if (dtype_str_upper == "F16") {
584 size_t num_elements = n_bytes / 2;
585 std::vector<float> f32_vec(num_elements);
586 const uint16_t* f16_ptr = reinterpret_cast<const uint16_t*>(data_ptr);
587 for (size_t i = 0; i < num_elements; ++i) {
588 f32_vec[i] = cpu_f16_to_float32(f16_ptr[i]);
589 }
590 std::vector<uint8_t> bytes_out(num_elements * sizeof(float));
591 memcpy(bytes_out.data(), f32_vec.data(), bytes_out.size());
592 return bytes_out;
593 } else if (dtype_str_upper == "BF16") {
594 size_t num_elements = n_bytes / 2;
595 std::vector<float> f32_vec(num_elements);
596 const uint16_t* bf16_ptr = reinterpret_cast<const uint16_t*>(data_ptr);
597 for (size_t i = 0; i < num_elements; ++i) {
598 f32_vec[i] = cpu_bf16_to_float32(bf16_ptr[i]);
599 }
600 std::vector<uint8_t> bytes_out(num_elements * sizeof(float));
601 memcpy(bytes_out.data(), f32_vec.data(), bytes_out.size());
602 return bytes_out;
603 }
604 throw std::runtime_error("SafeTensorsLoader: Unsupported tensor dtype for conversion: " + dtype_str_upper);
605}
float cpu_f16_to_float32(uint16_t f16_raw)
float cpu_bf16_to_float32(uint16_t bf16_raw)

References cpu_bf16_to_float32(), and cpu_f16_to_float32().

Referenced by get_tensor_bytes().

◆ get_shard_for_tensor()

const Shard * SafeTensorsLoader::get_shard_for_tensor ( const std::string &  tensor_name) const
private

Get the Shard object for a given tensor name.

Looks up the shard key for the tensor and returns a pointer to the corresponding Shard.

Parameters
tensor_nameName of the tensor.
Returns
Pointer to the Shard containing the tensor.
Exceptions
std::logic_errorif the shard is not found.

Definition at line 514 of file safetensors_loader.cpp.

514 {
515 auto map_it = tensor_name_to_shard_key_map_.find(tensor_name);
516 std::string determined_shard_key;
517
518 if (map_it != tensor_name_to_shard_key_map_.end()){
519 determined_shard_key = map_it->second;
520 } else {
521 const auto& tensor_info_direct = get_tensor_info(tensor_name);
522 determined_shard_key = tensor_info_direct.shard_key;
523 }
524
525 if (determined_shard_key.empty()){
526 throw std::logic_error("Internal inconsistency: Could not determine shard key for tensor '" + tensor_name + "'.");
527 }
528
529 auto shard_it = loaded_shards_.find(determined_shard_key);
530 if (shard_it == loaded_shards_.end()) {
531 throw std::logic_error("Internal inconsistency: Shard key '" + determined_shard_key + "' for tensor '" + tensor_name + "' not found in loaded_shards_ map. Tensors map has it, but shard object itself is missing.");
532 }
533 return shard_it->second.get();
534}
std::map< std::string, std::string > tensor_name_to_shard_key_map_
const TensorInfo & get_tensor_info(const std::string &name) const
Get information about a specific tensor.

References get_tensor_info(), loaded_shards_, and tensor_name_to_shard_key_map_.

Referenced by get_tensor_bytes().

◆ get_tensor_bytes()

std::vector< uint8_t > SafeTensorsLoader::get_tensor_bytes ( const std::string &  name) const

Get the raw bytes for a tensor, converting to FP32 if needed.

Parameters
nameName of the tensor to load.
Returns
Vector of bytes containing the tensor data (FP32 format).
Exceptions
std::runtime_errorif tensor not found or conversion fails.

Definition at line 536 of file safetensors_loader.cpp.

536 {
537 const TensorInfo& info = get_tensor_info(name);
538 const Shard* shard = get_shard_for_tensor(name);
539
540 const uint8_t* raw_data_ptr = shard->get_tensor_raw_data(info.data_offset, info.nbytes);
541 return convert_tensor_data(raw_data_ptr, info.nbytes, info.dtype);
542}
const Shard * get_shard_for_tensor(const std::string &tensor_name) const
Get the Shard object for a given tensor name.
std::vector< uint8_t > convert_tensor_data(const uint8_t *data, size_t size, const std::string &dtype) const
Convert raw tensor data to FP32 if needed.
Represents a memory-mapped SafeTensors file (shard).
const uint8_t * get_tensor_raw_data(size_t local_offset, size_t n_bytes) const
Get a pointer to the raw tensor data within this shard.

References convert_tensor_data(), SafeTensorsLoader::TensorInfo::data_offset, SafeTensorsLoader::TensorInfo::dtype, get_shard_for_tensor(), get_tensor_info(), Shard::get_tensor_raw_data(), and SafeTensorsLoader::TensorInfo::nbytes.

◆ get_tensor_info()

const SafeTensorsLoader::TensorInfo & SafeTensorsLoader::get_tensor_info ( const std::string &  name) const

Get information about a specific tensor.

Parameters
nameName of the tensor.
Returns
Reference to the tensor's information.
Exceptions
std::runtime_errorif tensor not found.

Definition at line 506 of file safetensors_loader.cpp.

506 {
507 auto it = tensors_.find(name);
508 if (it == tensors_.end()) {
509 throw std::runtime_error("Tensor not found in SafeTensorsLoader metadata: " + name);
510 }
511 return it->second;
512}

References tensors_.

Referenced by get_shard_for_tensor(), and get_tensor_bytes().

◆ load_all_tensors_parallel()

std::map< std::string, std::vector< uint8_t > > SafeTensorsLoader::load_all_tensors_parallel ( ) const

Load all tensors in parallel.

Returns
Map of tensor names to their data (FP32 format).

Definition at line 544 of file safetensors_loader.cpp.

544 {
545 std::map<std::string, std::vector<uint8_t>> result_map;
546 if (tensors_.empty()) {
547 Logger::debug("SafeTensorsLoader::load_all_tensors_parallel: No tensors to load.");
548 return result_map;
549 }
550
551 std::vector<std::future<std::pair<std::string, std::vector<uint8_t>>>> futures;
552 unsigned int n_threads = std::max(1u, std::thread::hardware_concurrency());
553 n_threads = std::min(n_threads, static_cast<unsigned int>(tensors_.size()));
554 if (n_threads > 16) n_threads = 16;
555
556 ThreadPool pool(n_threads);
557 Logger::info("SafeTensorsLoader: Loading all " + std::to_string(tensors_.size()) + " tensors in parallel using " + std::to_string(n_threads) + " threads.");
558
559 for (const auto& pair : tensors_) {
560 const std::string& tensor_name = pair.first;
561 futures.push_back(pool.submit([this, tensor_name]() {
562 std::vector<uint8_t> data = this->get_tensor_bytes(tensor_name);
563 return std::make_pair(tensor_name, std::move(data));
564 }));
565 }
566
567 for (auto& fut : futures) {
568 try {
569 std::pair<std::string, std::vector<uint8_t>> tensor_pair = fut.get();
570 result_map[tensor_pair.first] = std::move(tensor_pair.second);
571 } catch (const std::exception& e) {
572 Logger::error("SafeTensorsLoader: Error loading a tensor in parallel task: " + std::string(e.what()));
573 throw;
574 }
575 }
576 Logger::info("SafeTensorsLoader: Finished loading all tensors in parallel.");
577 return result_map;
578}
static void debug(const std::string &message)
Definition logger.cpp:131
static void error(const std::string &message)
Definition logger.cpp:143
Thread pool for parallel tensor loading operations.

References Logger::debug(), Logger::error(), Logger::info(), ThreadPool::submit(), and tensors_.

◆ load_from_directory()

void SafeTensorsLoader::load_from_directory ( const std::string &  directory_path)
private

Load tensors from a directory, handling index files and multiple shards.

If an index file is found, parses it and loads the referenced shards. Otherwise, scans for .safetensors files and loads them as individual shards.

Parameters
directory_pathPath to the directory containing model files.

Definition at line 318 of file safetensors_loader.cpp.

318 {
319 Logger::debug("SafeTensorsLoader::load_from_directory for '" + directory_path_str + "'.");
320 std::filesystem::path dir_p(directory_path_str);
321 std::filesystem::path index_json_path_v1 = dir_p / "model.safetensors.index.json";
322 std::filesystem::path index_json_path_v2 = dir_p / "pytorch_model.bin.index.json";
323 std::filesystem::path actual_index_path;
324
325 bool index_found = false;
326 if (std::filesystem::exists(index_json_path_v1) && std::filesystem::is_regular_file(index_json_path_v1)) {
327 actual_index_path = index_json_path_v1;
328 index_found = true;
329 } else if (std::filesystem::exists(index_json_path_v2) && std::filesystem::is_regular_file(index_json_path_v2)) {
330 actual_index_path = index_json_path_v2;
331 index_found = true;
332 }
333
334 if (index_found) {
335 Logger::info("SafeTensorsLoader: Found index file: " + actual_index_path.string());
336 is_sharded_ = true;
337 std::ifstream f(actual_index_path.string());
338 if (!f.is_open()) {
339 throw std::runtime_error("SafeTensorsLoader: Failed to open index file: " + actual_index_path.string());
340 }
341 nlohmann::json index_json_data;
342 try {
343 index_json_data = nlohmann::json::parse(f);
344 } catch (const nlohmann::json::parse_error& e) {
345 f.close();
346 throw std::runtime_error("SafeTensorsLoader: Failed to parse index JSON from " + actual_index_path.string() + ": " + e.what());
347 }
348 f.close();
349
350 if (index_json_data.count("weight_map") && index_json_data["weight_map"].is_object()) {
351 // First pass: populate tensor_name_to_shard_key_map_ and identify unique shards to load
352 std::map<std::string, std::string> unique_shards_to_load; // shard_filename -> full_path
353 for (auto const& [tensor_name, shard_filename_json] : index_json_data["weight_map"].items()) {
354 if (!shard_filename_json.is_string()) {
355 Logger::warning("SafeTensorsLoader: Shard filename for tensor '" + tensor_name + "' in index is not a string. Skipping.");
356 continue;
357 }
358 std::string shard_filename = shard_filename_json.get<std::string>();
359 tensor_name_to_shard_key_map_[tensor_name] = shard_filename;
360 if (unique_shards_to_load.find(shard_filename) == unique_shards_to_load.end()) {
361 unique_shards_to_load[shard_filename] = (dir_p / shard_filename).string();
362 }
363 }
364
365 // Second pass: load each unique shard and parse its metadata
366 for(const auto& pair : unique_shards_to_load){
367 const std::string& shard_filename = pair.first;
368 const std::string& full_shard_path = pair.second;
369 if (loaded_shards_.find(shard_filename) == loaded_shards_.end()) {
370 Logger::info("SafeTensorsLoader: Loading and parsing shard (from index): " + full_shard_path + " (key:"+ shard_filename + ")");
371 load_single_file(full_shard_path, shard_filename);
372 } else {
373 Logger::debug("SafeTensorsLoader: Shard '" + shard_filename + "' already loaded/parsed (should not happen if unique_shards logic is correct).");
374 }
375 }
376
377 } else {
378 throw std::runtime_error("SafeTensorsLoader: Index file " + actual_index_path.string() + " does not contain a valid 'weight_map'.");
379 }
380 } else {
381 Logger::info("SafeTensorsLoader: No index file found in " + directory_path_str + ". Scanning for *.safetensors files.");
382 std::vector<std::filesystem::path> shard_files;
383 for (const auto& entry : std::filesystem::directory_iterator(dir_p)) {
384 if (entry.is_regular_file() && entry.path().extension() == ".safetensors") {
385 shard_files.push_back(entry.path());
386 }
387 }
388
389 if (shard_files.empty()) {
390 Logger::warning("SafeTensorsLoader: No .safetensors files found directly in directory: " + directory_path_str + ". Checking for model.safetensors as last resort.");
391 std::filesystem::path single_model_file = dir_p / "model.safetensors";
392 if(std::filesystem::exists(single_model_file) && std::filesystem::is_regular_file(single_model_file)){
393 Logger::info("SafeTensorsLoader: Found 'model.safetensors' in directory, loading it as a single non-sharded model.");
394 load_single_file(single_model_file.string(), single_model_file.filename().string());
395 is_sharded_ = false;
396 } else {
397 Logger::info("SafeTensorsLoader: No .safetensors files or index.json found in directory: " + directory_path_str + ". No model weights will be loaded from this path directly.");
398 }
399 } else if (shard_files.size() == 1) {
400 Logger::info("SafeTensorsLoader: Found single .safetensors file: " + shard_files[0].string() + ". Loading as non-sharded.");
401 load_single_file(shard_files[0].string(), shard_files[0].filename().string());
402 is_sharded_ = false;
403 } else {
404 Logger::info("SafeTensorsLoader: Found " + std::to_string(shard_files.size()) + " .safetensors files (no index). Loading all as individual shards.");
405 is_sharded_ = true;
406 for (const auto& p : shard_files) {
407 load_single_file(p.string(), p.filename().string());
408 }
409 }
410 }
411}

References Logger::debug(), Logger::info(), is_sharded_, load_single_file(), loaded_shards_, tensor_name_to_shard_key_map_, and Logger::warning().

Referenced by SafeTensorsLoader().

◆ load_model_config_from_json()

bool SafeTensorsLoader::load_model_config_from_json ( const std::string &  model_path_or_dir,
ModelConfig config_to_populate 
)
static

Loads model configuration from a JSON file corresponding to a .safetensors model path.

Given the path to a .safetensors model or directory, this method attempts to find a "config.json" in the same directory. If found, it parses the JSON and populates the provided ModelConfig object.

Parameters
model_path_or_dirPath to the .safetensors model file or directory.
config_to_populateReference to a ModelConfig object to be filled.
Returns
True if config.json was found and successfully parsed, false otherwise.

Definition at line 607 of file safetensors_loader.cpp.

607 {
608 std::filesystem::path model_fs_path(model_path_or_dir_str);
609 std::filesystem::path config_json_path;
610
611 if (std::filesystem::is_directory(model_fs_path)) {
612 config_json_path = model_fs_path / "config.json";
613 } else if (std::filesystem::is_regular_file(model_fs_path)) {
614 config_json_path = model_fs_path.parent_path() / "config.json";
615 } else {
616 Logger::error("SafeTensorsLoader::load_model_config_from_json: Provided model path is not a valid file or directory: " + model_path_or_dir_str);
617 return false;
618 }
619 std::string config_json_path_str = config_json_path.string();
620
621 std::ifstream f(config_json_path_str);
622 if (!f.is_open()) {
623 Logger::warning("SafeTensorsLoader: config.json not found at: " + config_json_path_str);
624 return false;
625 }
626
627 try {
628 nlohmann::json data = nlohmann::json::parse(f);
629 f.close();
630
631 config_to_populate.hidden_size = data.value("hidden_size", 0);
632 config_to_populate.intermediate_size = data.value("intermediate_size", 0);
633 config_to_populate.num_attention_heads = data.value("num_attention_heads", 0);
634 config_to_populate.num_key_value_heads = data.value("num_key_value_heads", config_to_populate.num_attention_heads);
635 config_to_populate.num_hidden_layers = data.value("num_hidden_layers", 0);
636 config_to_populate.vocab_size = data.value("vocab_size", 0);
637 config_to_populate.max_position_embeddings = data.value("max_position_embeddings", 2048);
638 config_to_populate.rms_norm_eps = data.value("rms_norm_eps", 1e-5f);
639 config_to_populate.rope_theta = data.value("rope_theta", 10000.0f);
640 config_to_populate.bos_token_id = data.value("bos_token_id", 1);
641 config_to_populate.eos_token_id = data.value("eos_token_id", 2);
642 config_to_populate.pad_token_id = data.value("pad_token_id", -1);
643 config_to_populate.unk_token_id = data.value("unk_token_id", 0);
644
645 if (data.contains("architectures") && data["architectures"].is_array() && !data["architectures"].empty()) {
646 config_to_populate.architecture = data["architectures"][0].get<std::string>();
647 } else {
648 config_to_populate.architecture = data.value("model_type", "unknown");
649 }
650 config_to_populate.model_name = data.value("model_type", config_to_populate.architecture);
651
652 bool is_llama3_vocab_size_json = (config_to_populate.vocab_size == 128256);
653 bool is_llama3_arch_hint_json = (config_to_populate.architecture.find("LlamaForCausalLM") != std::string::npos &&
654 config_to_populate.architecture.find("Llama2") == std::string::npos);
655
656 if (is_llama3_vocab_size_json && is_llama3_arch_hint_json) {
658 if (config_to_populate.rope_theta == 10000.0f) {
659 float llama3_rope_candidate = data.value("rope_theta", 500000.0f);
660 if (llama3_rope_candidate > 10000.0f) {
661 config_to_populate.rope_theta = llama3_rope_candidate;
662 } else if (config_to_populate.rope_theta == 10000.0f) {
663 config_to_populate.rope_theta = 500000.0f;
664 }
665 }
666 } else if (config_to_populate.vocab_size == 32000 || config_to_populate.architecture.find("Llama") != std::string::npos) {
668 } else {
670 }
671 config_to_populate.is_gguf_file_loaded = false;
672
673 Logger::info("SafeTensorsLoader: Successfully loaded and parsed model config from: " + config_json_path_str);
674 return true;
675
676 } catch (const nlohmann::json::exception& e) {
677 Logger::error("SafeTensorsLoader: Failed to parse config.json: " + config_json_path_str + ". Error: " + e.what());
678 return false;
679 }
680 return false;
681}
int hidden_size
Definition model.h:81
int vocab_size
Definition model.h:86
int pad_token_id
Definition model.h:95
std::string architecture
Definition model.h:96
std::string model_name
Definition model.h:97
float rms_norm_eps
Definition model.h:88
int num_attention_heads
Definition model.h:83
int intermediate_size
Definition model.h:82
int eos_token_id
Definition model.h:93
bool is_gguf_file_loaded
Definition model.h:101
float rope_theta
Definition model.h:89
int num_hidden_layers
Definition model.h:85
int num_key_value_heads
Definition model.h:84
int bos_token_id
Definition model.h:92
TokenizerFamily tokenizer_family
Definition model.h:117
int unk_token_id
Definition model.h:94
int max_position_embeddings
Definition model.h:87

References ModelConfig::architecture, ModelConfig::bos_token_id, ModelConfig::eos_token_id, Logger::error(), ModelConfig::hidden_size, Logger::info(), ModelConfig::intermediate_size, ModelConfig::is_gguf_file_loaded, ModelConfig::LLAMA3_TIKTOKEN, ModelConfig::LLAMA_SENTENCEPIECE, ModelConfig::max_position_embeddings, ModelConfig::model_name, ModelConfig::num_attention_heads, ModelConfig::num_hidden_layers, ModelConfig::num_key_value_heads, ModelConfig::pad_token_id, ModelConfig::rms_norm_eps, ModelConfig::rope_theta, ModelConfig::tokenizer_family, ModelConfig::unk_token_id, ModelConfig::UNKNOWN, ModelConfig::vocab_size, and Logger::warning().

Referenced by TinyLlamaModel::TinyLlamaModel(), and tinyllama::TinyLlamaSession::TinyLlamaSession().

◆ load_single_file()

void SafeTensorsLoader::load_single_file ( const std::string &  file_path,
const std::string &  shard_key_override = "" 
)
private

Load a single .safetensors file as a shard.

Memory-maps the file and parses its metadata to populate tensor information.

Parameters
file_pathPath to the .safetensors file.
shard_key_overrideOptional key to use for this shard (e.g., filename).

Definition at line 413 of file safetensors_loader.cpp.

413 {
414 std::string key_to_use = shard_key_override.empty() ? std::filesystem::path(file_path).filename().string() : shard_key_override;
415 if (key_to_use.empty()) key_to_use = file_path;
416
417 if (loaded_shards_.count(key_to_use)) {
418 Logger::debug("SafeTensorsLoader: Shard/file '" + key_to_use + "' (path: " + file_path + ") already processed/loaded.");
419 return;
420 }
421 Logger::info("SafeTensorsLoader: Loading single file/shard: " + file_path + " with key: " + key_to_use);
422 try {
423 auto shard = std::make_unique<Shard>(file_path);
424 parse_shard_metadata(*shard, key_to_use);
425 loaded_shards_[key_to_use] = std::move(shard);
426 } catch (const std::exception& e) {
427 throw std::runtime_error("SafeTensorsLoader: Error processing file/shard '" + file_path + "' (key: " + key_to_use + "): " + e.what());
428 }
429}
void parse_shard_metadata(Shard &shard, const std::string &shard_key)
Parse the metadata of a shard and populate tensor information.

References Logger::debug(), Logger::info(), loaded_shards_, and parse_shard_metadata().

Referenced by load_from_directory(), and SafeTensorsLoader().

◆ operator=()

SafeTensorsLoader & SafeTensorsLoader::operator= ( const SafeTensorsLoader )
delete

◆ parse_shard_metadata()

void SafeTensorsLoader::parse_shard_metadata ( Shard shard,
const std::string &  shard_key 
)
private

Parse the metadata of a shard and populate tensor information.

Reads the metadata JSON from the shard and adds entries to the tensors_ map.

Parameters
shardReference to the Shard object.
shard_keyKey identifying this shard (e.g., filename).

Definition at line 431 of file safetensors_loader.cpp.

431 {
432 Logger::debug("SafeTensorsLoader: Parsing metadata for shard: " + shard_key + " (file: " + shard.file_path + ")");
433 if (!shard.metadata_ptr || shard.metadata_size == 0) {
434 throw std::runtime_error("Shard metadata is not available for parsing (nullptr or zero size): " + shard.file_path);
435 }
436 std::string metadata_json_str;
437 try {
438 metadata_json_str.assign(reinterpret_cast<const char*>(shard.metadata_ptr), shard.metadata_size);
439 } catch (const std::length_error& le) {
440 throw std::runtime_error("Error constructing metadata string for shard " + shard.file_path + ": " + le.what());
441 }
442
443 nlohmann::json metadata_root;
444 try {
445 metadata_root = nlohmann::json::parse(metadata_json_str);
446 } catch (const nlohmann::json::parse_error& e) {
447 throw std::runtime_error("Failed to parse metadata JSON for shard " + shard.file_path + " (key: " + shard_key + ") at offset 8, metadata_size: " +
448 std::to_string(shard.metadata_size) + ". Error: " + e.what() +
449 "\nJSON content snippet (first 200 chars): " + metadata_json_str.substr(0, 200));
450 }
451
452 size_t tensors_in_this_shard_count = 0;
453 for (auto const& [tensor_name_str, info_json] : metadata_root.items()) {
454 if (tensor_name_str == "__metadata__") continue;
455
456 TensorInfo tensor_info;
457 tensor_info.name = tensor_name_str;
458 try {
459 tensor_info.dtype = info_json.at("dtype").get<std::string>();
460 std::transform(tensor_info.dtype.begin(), tensor_info.dtype.end(), tensor_info.dtype.begin(),
461 [](unsigned char c){ return static_cast<char>(std::toupper(c)); });
462
463 for (const auto& dim : info_json.at("shape")) {
464 tensor_info.shape.push_back(dim.get<size_t>());
465 }
466 const auto& data_offsets_json = info_json.at("data_offsets");
467 if (!data_offsets_json.is_array() || data_offsets_json.size() != 2) {
468 throw std::runtime_error("Tensor '" + tensor_name_str + "' 'data_offsets' must be an array of two numbers.");
469 }
470 size_t start_offset_in_data_block = data_offsets_json[0].get<size_t>();
471 size_t end_offset_in_data_block = data_offsets_json[1].get<size_t>();
472
473 tensor_info.data_offset = start_offset_in_data_block;
474 tensor_info.nbytes = end_offset_in_data_block - start_offset_in_data_block;
475 tensor_info.shard_key = shard_key;
476
477 if (tensors_.count(tensor_info.name)) {
478 Logger::warning("SafeTensorsLoader: Duplicate tensor name '" + tensor_info.name + "' encountered. " +
479 "Previous shard key: '" + tensors_[tensor_info.name].shard_key + "', New shard key: '" + shard_key + "'. " +
480 "Overwriting with info from current shard being parsed. This can happen with unindexed multi-file loads or inconsistent index files.");
481 }
482 tensors_[tensor_info.name] = tensor_info;
483 if (tensor_name_to_shard_key_map_.find(tensor_info.name) == tensor_name_to_shard_key_map_.end()){
484 tensor_name_to_shard_key_map_[tensor_info.name] = shard_key;
485 }
486
487 tensors_in_this_shard_count++;
488
489 } catch (const nlohmann::json::exception& e) {
490 throw std::runtime_error("Failed to parse tensor info for '" + tensor_name_str + "' in shard " +
491 shard.file_path + " (key: " + shard_key + "): " + e.what());
492 }
493 }
494 Logger::debug("SafeTensorsLoader: Finished parsing metadata for shard: " + shard_key + ". Parsed " + std::to_string(tensors_in_this_shard_count) + " tensor entries from this shard.");
495}
uint64_t metadata_size
Size of the metadata block in bytes.
std::string file_path
Path to the shard file.
const uint8_t * metadata_ptr
Pointer to the start of the metadata block.

References SafeTensorsLoader::TensorInfo::data_offset, Logger::debug(), SafeTensorsLoader::TensorInfo::dtype, Shard::file_path, Shard::metadata_ptr, Shard::metadata_size, SafeTensorsLoader::TensorInfo::name, SafeTensorsLoader::TensorInfo::nbytes, SafeTensorsLoader::TensorInfo::shape, SafeTensorsLoader::TensorInfo::shard_key, tensor_name_to_shard_key_map_, tensors_, and Logger::warning().

Referenced by load_single_file().

◆ tensor_names()

std::vector< std::string > SafeTensorsLoader::tensor_names ( ) const

Get a list of all tensor names available in the loaded model.

Returns
Vector of tensor names.

Definition at line 497 of file safetensors_loader.cpp.

497 {
498 std::vector<std::string> names;
499 names.reserve(tensors_.size());
500 for (const auto& pair : tensors_) {
501 names.push_back(pair.first);
502 }
503 return names;
504}

References tensors_.

Member Data Documentation

◆ is_sharded_

bool SafeTensorsLoader::is_sharded_ = false
private

True if model is loaded from multiple shard files

Definition at line 195 of file safetensors_loader.h.

Referenced by load_from_directory(), and SafeTensorsLoader().

◆ loaded_shards_

std::map<std::string, std::unique_ptr<Shard> > SafeTensorsLoader::loaded_shards_
private

Map of shard keys (e.g., filenames) to Shard objects

Definition at line 198 of file safetensors_loader.h.

Referenced by get_shard_for_tensor(), load_from_directory(), load_single_file(), SafeTensorsLoader(), and ~SafeTensorsLoader().

◆ model_load_path_

std::string SafeTensorsLoader::model_load_path_
private

Original path provided to constructor (file or directory)

Definition at line 194 of file safetensors_loader.h.

Referenced by SafeTensorsLoader().

◆ tensor_name_to_shard_key_map_

std::map<std::string, std::string> SafeTensorsLoader::tensor_name_to_shard_key_map_
private

◆ tensors_

std::map<std::string, TensorInfo> SafeTensorsLoader::tensors_
private

Global map of tensor names to their comprehensive info

Definition at line 197 of file safetensors_loader.h.

Referenced by get_tensor_info(), load_all_tensors_parallel(), parse_shard_metadata(), SafeTensorsLoader(), and tensor_names().


The documentation for this class was generated from the following files: