Encoder
Encodes text chunks and metadata as QR codes in video frames for MemVid.
Adds new documents and builds video storage with QR codes.
Maintains mapping between document IDs and video frames for efficient deletion.
- class langchain_memvid.encoder.Encoder(config, index_manager)[source]
Bases:
object
Encodes text chunks and metadata as QR codes in video frames for MemVid.
Adds new documents and builds video storage with QR codes.
Maintains mapping between document IDs and video frames for efficient deletion.
- Parameters:
config (VectorStoreConfig)
index_manager (IndexManager)
- __init__(config, index_manager)[source]
Initialize the encoder.
- Parameters:
config (
VectorStoreConfig
) – Configuration for the encoderindex_manager (
IndexManager
) – Index manager for storing embeddings
Example
>>> config = VectorStoreConfig(...) >>> index_manager = IndexManager(...) >>> encoder = Encoder(config, index_manager)
- add_chunks(texts, metadatas=None)[source]
Add text chunks for encoding.
- Parameters:
- Raises:
EncodingError – If adding chunks fails
Example
>>> encoder.add_chunks(["text1", "text2"], [{"source": "doc1"}, {"source": "doc2"}])
- build_video(output_file=PosixPath('kb_data.mp4'), index_dir=PosixPath('kb_index.d'))[source]
Build video from added chunks.
This method implements the hybrid storage approach and optimization strategies for efficient video building and frame mapping.
Hybrid Storage Implementation
Essential Metadata: Stores only essential metadata in FAISS for efficiency
Document text, source, category, doc_id, metadata_hash
Significant reduction in FAISS index size
Fast search operations with minimal memory usage
Full Metadata: Stores complete metadata in video QR codes
All metadata fields and custom attributes
Complete backup and archive functionality
On-demand retrieval when needed
Metadata is stored in the video QR codes
Optimization Strategies
Frame Index Mapping
Bidirectional Mapping: Establishes doc_id frame_number mapping
O(1) Lookup: Enables constant-time frame number retrieval
Deletion Optimization: Allows precise frame-level deletion without full video rebuilds
Consistency: Maintains synchronization between FAISS index and video frames
Performance Characteristics
Encoding Time: Optimized for large document collections
Memory Usage: Efficient processing of chunks and frames
Storage Efficiency: Hybrid approach reduces overall storage requirements
Quality: Maintains video quality while optimizing storage
Process Flow
Text Processing: Extract texts from chunks for FAISS indexing
FAISS Indexing: Add essential metadata to FAISS index
QR Code Generation: Create QR codes with full metadata
Frame Mapping: Establish bidirectional document-to-frame mapping
Video Encoding: Encode QR codes into video frames
Index Saving: Save FAISS index with frame mappings
- Parameters:
- Returns:
Statistics for the video build process including:
total_chunks: Number of chunks encoded
video_size_mb: Size of the video file in MB
encoding_time: Time taken for encoding in seconds
index_path: Path to the saved index
video_path: Path to the saved video
- Return type:
BuildStats
- Raises:
EncodingError – If video building fails
ValueError – If output paths are invalid
Example
# Build video with hybrid storage approach stats = encoder.build_video(Path(“output.mp4”), Path(“index.d”)) print(f”Encoded {stats.total_chunks} chunks in {stats.encoding_time:.2f}s”) print(f”Video size: {stats.video_size_mb:.2f} MB”)
# Check frame mapping statistics frame_stats = encoder.index_manager.get_frame_mapping_stats() print(f”Frame mapping coverage: {frame_stats[‘mapping_coverage’]:.1f}%”)