Advanced Chunk Processing Library 0.2.0
A comprehensive C++ library for advanced data chunking strategies and processing operations
Loading...
Searching...
No Matches
advanced_structures::SemanticChunker< ContentType, ModelType > Class Template Reference

Template class for semantic-based content chunking. More...

#include <advanced_structures.hpp>

+ Collaboration diagram for advanced_structures::SemanticChunker< ContentType, ModelType >:

Public Member Functions

 SemanticChunker (double threshold=0.7, ModelType custom_model=ModelType())
 Construct a new Semantic Chunker.
 
std::vector< ContentType > chunk (const ContentType &content)
 Chunk content based on semantic boundaries.
 
void setModel (ModelType new_model)
 Set a new NLP model.
 
void setSimilarityThreshold (double threshold)
 Set new similarity threshold.
 

Private Attributes

ModelType model
 NLP model instance.
 
double similarity_threshold
 Threshold for determining chunk boundaries.
 

Detailed Description

template<typename ContentType, typename ModelType = DefaultNLPModel>
class advanced_structures::SemanticChunker< ContentType, ModelType >

Template class for semantic-based content chunking.

SemanticChunker splits content based on semantic boundaries using configurable NLP models and similarity metrics.

Template Parameters
ContentTypeType of content to be chunked
ModelTypeType of NLP model to use for similarity calculations

Definition at line 39 of file advanced_structures.hpp.

Constructor & Destructor Documentation

◆ SemanticChunker()

template<typename ContentType , typename ModelType = DefaultNLPModel>
advanced_structures::SemanticChunker< ContentType, ModelType >::SemanticChunker ( double  threshold = 0.7,
ModelType  custom_model = ModelType() 
)
inlineexplicit

Construct a new Semantic Chunker.

Parameters
thresholdSimilarity threshold for chunk boundaries (default: 0.7)
custom_modelCustom NLP model instance (optional)

Definition at line 51 of file advanced_structures.hpp.

52 : model(custom_model), similarity_threshold(threshold) {}
double similarity_threshold
Threshold for determining chunk boundaries.

Member Function Documentation

◆ chunk()

template<typename ContentType , typename ModelType = DefaultNLPModel>
std::vector< ContentType > advanced_structures::SemanticChunker< ContentType, ModelType >::chunk ( const ContentType &  content)

Chunk content based on semantic boundaries.

Parameters
contentInput content to be chunked
Returns
std::vector<ContentType> Vector of content chunks

Referenced by main().

◆ setModel()

template<typename ContentType , typename ModelType = DefaultNLPModel>
void advanced_structures::SemanticChunker< ContentType, ModelType >::setModel ( ModelType  new_model)
inline

Set a new NLP model.

Parameters
new_modelNew model instance to use

Definition at line 67 of file advanced_structures.hpp.

67 {
68 model = new_model;
69 }

References advanced_structures::SemanticChunker< ContentType, ModelType >::model.

◆ setSimilarityThreshold()

template<typename ContentType , typename ModelType = DefaultNLPModel>
void advanced_structures::SemanticChunker< ContentType, ModelType >::setSimilarityThreshold ( double  threshold)
inline

Set new similarity threshold.

Parameters
thresholdNew threshold value between 0.0 and 1.0

Definition at line 76 of file advanced_structures.hpp.

76 {
77 similarity_threshold = threshold;
78 }

References advanced_structures::SemanticChunker< ContentType, ModelType >::similarity_threshold.

Member Data Documentation

◆ model

template<typename ContentType , typename ModelType = DefaultNLPModel>
ModelType advanced_structures::SemanticChunker< ContentType, ModelType >::model
private

NLP model instance.

Definition at line 41 of file advanced_structures.hpp.

Referenced by advanced_structures::SemanticChunker< ContentType, ModelType >::setModel().

◆ similarity_threshold

template<typename ContentType , typename ModelType = DefaultNLPModel>
double advanced_structures::SemanticChunker< ContentType, ModelType >::similarity_threshold
private

Threshold for determining chunk boundaries.

Definition at line 42 of file advanced_structures.hpp.

Referenced by advanced_structures::SemanticChunker< ContentType, ModelType >::setSimilarityThreshold().


The documentation for this class was generated from the following file: