|
TinyLlama.cpp 1.0
A lightweight C++ implementation of the TinyLlama language model
|
Key-Value cache for a single transformer layer. More...
#include <model.h>

Public Attributes | |
| std::vector< float > | k |
| std::vector< float > | v |
Key-Value cache for a single transformer layer.
Stores the key and value tensors for attention mechanism, with optional CUDA support for GPU acceleration.
| std::vector<float> KVCacheLayer::k |
Definition at line 131 of file model.h.
Referenced by attention_batch_cpu(), attention_batch_cpu_sequence_aware(), TinyLlamaModel::forward(), update_kv_cache_batch_cpu(), and update_kv_cache_batch_cpu_sequence_aware().
| std::vector<float> KVCacheLayer::v |
Definition at line 132 of file model.h.
Referenced by attention_batch_cpu(), attention_batch_cpu_sequence_aware(), TinyLlamaModel::forward(), update_kv_cache_batch_cpu(), and update_kv_cache_batch_cpu_sequence_aware().