Index
Manages the FAISS vector index and essential metadata for MemVid.
Stores essential metadata for fast search.
Maintains bidirectional mapping between document IDs and video frame numbers for efficient deletion.
This module provides functionality for managing vector indices used in LangChain MemVid, including FAISS index creation, updating, and searching.
- class langchain_memvid.index.IndexManager(config, embeddings)[source]
Bases:
object
Manages vector indices for MemVid.
This index manager implements a hybrid storage approach that optimizes storage efficiency while maintaining performance and data integrity.
Hybrid Storage Approach
Essential Metadata Only: Stores only essential metadata in FAISS for efficiency
Document text, source, category, doc_id, metadata_hash
Significant reduction in FAISS index size compared to full metadata storage
Fast search operations with minimal memory usage
Full Metadata in Video: Complete metadata stored in video QR codes
All metadata fields and custom attributes
Complete backup and archive functionality
On-demand retrieval when needed
Optimization Strategies for Document Deletion
The index manager implements optimized deletion strategies to avoid full video rebuilds:
Frame Index Mapping
Maintains bidirectional mapping between document IDs and frame numbers
Enables O(1) lookup for frame numbers given document IDs
Allows precise frame-level deletion without full video rebuilds
Performance Characteristics
Search Performance: Sub-second search with essential metadata
Storage Efficiency: Significant reduction in FAISS index size
Deletion Performance: O(k) time complexity where k = frames to delete
Memory Usage: Optimized for large-scale operations
Best Practices
Batch Operations: Add or delete multiple documents at once for better efficiency
Frame Mapping: Monitor frame mapping integrity for optimal deletion performance
Metadata Management: Use essential metadata for search, full metadata for details
Error Handling: Implement fallback mechanisms for corrupted data
- Parameters:
config (IndexConfig)
embeddings (Any)
- __init__(config, embeddings)[source]
Initialize the index manager.
- Parameters:
config (
IndexConfig
) – Configuration for the indexembeddings (
Any
) – LangChain embeddings interface
- _rebuild_index_without_deleted(deleted_ids)[source]
Rebuild the index after deleting specified document IDs.
- Parameters:
deleted_ids (
List
[int
]) – List of document IDs that were deleted (in descending order)- Raises:
MemVidIndexError – If rebuilding fails
- get_all_documents()[source]
Get all documents in the index.
- Return type:
- Returns:
List of all document metadata dictionaries
- Raises:
MemVidIndexError – If retrieval fails
- get_document_count()[source]
Get the total number of documents in the index.
- Return type:
- Returns:
Number of documents in the index
- get_frame_mapping_stats()[source]
Get statistics about frame mappings for monitoring and optimization.
- Returns:
Statistics about frame mappings.
- Return type:
FrameMappingStats
- get_frames_to_delete(doc_ids)[source]
Get frame numbers that need to be deleted for given document IDs.
This method is a key component of the optimized deletion strategy, enabling precise frame-level deletion without full video rebuilds.
Optimization Strategy
Frame Mapping Lookup: Uses O(1) lookup to find frame numbers for document IDs
Safe Deletion Order: Returns frames in reverse order for safe deletion
Efficient Processing: Processes multiple document IDs in a single operation
Error Handling: Gracefully handles missing frame mappings
Performance Characteristics
Lookup Time: O(k) where k = number of document IDs
Memory Usage: Minimal temporary storage for frame numbers
Scalability: Efficient for large-scale deletions
Reliability: Handles missing mappings gracefully
Use Cases
Video Frame Removal: Provides frame numbers for selective video editing
Optimized Deletion: Enables efficient document removal without full rebuilds
Batch Processing: Supports deletion of multiple documents at once
Statistics: Provides data for deletion performance analysis
- Parameters:
- Return type:
- Returns:
List of frame numbers to delete, sorted in reverse order for safe deletion
Example
# Get frames to delete for multiple documents doc_ids = [0, 5, 10] frames_to_delete = index_manager.get_frames_to_delete(doc_ids) print(f”Frames to delete: {frames_to_delete}”) # e.g., [10, 5, 0]
# Use frames for video editing video_processor.remove_frames_from_video(video_path, frames_to_delete)
- load(path=PosixPath('kb_index.d'))[source]
Load the index and metadata from disk.
- Parameters:
path (
Path
) – Path to load the index from, defaults to LANGCHAIN_MEMVID_DEFAULT_INDEX_DIR- Raises:
MemVidIndexError – If loading fails
- save(path=PosixPath('kb_index.d'))[source]
Save the index and metadata to disk.
- Parameters:
path (
Path
) – Path to save the index, defaults to LANGCHAIN_MEMVID_DEFAULT_INDEX_DIR- Raises:
MemVidIndexError – If saving fails
- search_text(query_text, k=4)[source]
Search for similar texts using a text query.
- Return type:
- Parameters:
- set_frame_mapping(doc_id, frame_number)[source]
Set the frame mapping for a document.
This method establishes the bidirectional mapping between document IDs and frame numbers, which is essential for optimized deletion strategies.
Frame Index Mapping
Bidirectional Mapping: doc_id frame_number for efficient lookups
O(1) Lookup: Enables constant-time frame number retrieval
Deletion Optimization: Allows precise frame-level deletion without full video rebuilds
Consistency: Maintains synchronization between FAISS index and video frames
Performance Benefits
Fast Deletion: O(k) time complexity where k = frames to delete
Memory Efficient: Minimal memory overhead for mapping storage
Scalable: Efficient for large document collections
Reliable: Provides fallback mechanisms when mappings are corrupted
Use Cases
Optimized Deletion: Enables selective frame removal from videos
Frame Lookup: Fast retrieval of frame numbers for document IDs
Document Lookup: Fast retrieval of document IDs for frame numbers
Statistics: Provides mapping coverage statistics for monitoring
Example
# Set frame mapping for a document index_manager.set_frame_mapping(123, 5)
# Retrieve frame number frame_num = index_manager.get_frame_number(123) # Returns 5
# Retrieve document ID doc_id = index_manager.get_document_id(5) # Returns 123
- class langchain_memvid.index.SearchResult(text, source=None, category=None, similarity=0.0, metadata=None)[source]
Bases:
object
Represents a search result with metadata and similarity score.
- Parameters: