Index

Manages the FAISS vector index and essential metadata for MemVid.

  • Stores essential metadata for fast search.

  • Maintains bidirectional mapping between document IDs and video frame numbers for efficient deletion.

This module provides functionality for managing vector indices used in LangChain MemVid, including FAISS index creation, updating, and searching.

class langchain_memvid.index.IndexManager(config, embeddings)[source]

Bases: object

Manages vector indices for MemVid.

This index manager implements a hybrid storage approach that optimizes storage efficiency while maintaining performance and data integrity.

Hybrid Storage Approach

  • Essential Metadata Only: Stores only essential metadata in FAISS for efficiency

    • Document text, source, category, doc_id, metadata_hash

    • Significant reduction in FAISS index size compared to full metadata storage

    • Fast search operations with minimal memory usage

  • Full Metadata in Video: Complete metadata stored in video QR codes

    • All metadata fields and custom attributes

    • Complete backup and archive functionality

    • On-demand retrieval when needed

Optimization Strategies for Document Deletion

The index manager implements optimized deletion strategies to avoid full video rebuilds:

Frame Index Mapping

  • Maintains bidirectional mapping between document IDs and frame numbers

  • Enables O(1) lookup for frame numbers given document IDs

  • Allows precise frame-level deletion without full video rebuilds

Performance Characteristics

  • Search Performance: Sub-second search with essential metadata

  • Storage Efficiency: Significant reduction in FAISS index size

  • Deletion Performance: O(k) time complexity where k = frames to delete

  • Memory Usage: Optimized for large-scale operations

Best Practices

  • Batch Operations: Add or delete multiple documents at once for better efficiency

  • Frame Mapping: Monitor frame mapping integrity for optimal deletion performance

  • Metadata Management: Use essential metadata for search, full metadata for details

  • Error Handling: Implement fallback mechanisms for corrupted data

Parameters:
__init__(config, embeddings)[source]

Initialize the index manager.

Parameters:
  • config (IndexConfig) – Configuration for the index

  • embeddings (Any) – LangChain embeddings interface

_rebuild_index_without_deleted(deleted_ids)[source]

Rebuild the index after deleting specified document IDs.

Parameters:

deleted_ids (List[int]) – List of document IDs that were deleted (in descending order)

Raises:

MemVidIndexError – If rebuilding fails

add_texts(texts, metadata=None)[source]

Add texts and essential metadata to the index.

Parameters:
create_index()[source]

Create a new FAISS index based on the embeddings model.

delete_by_ids(doc_ids)[source]

Delete documents by their IDs and update index and mappings.

Return type:

bool

Parameters:

doc_ids (List[int])

delete_by_texts(texts)[source]

Delete documents by their text content.

Return type:

bool

Parameters:

texts (List[str])

delete_frames_from_mapping(frame_numbers)[source]

Remove frame mappings for deleted frames.

Parameters:

frame_numbers (List[int]) – List of frame numbers that were deleted

get_all_documents()[source]

Get all documents in the index.

Return type:

List[Dict[str, Any]]

Returns:

List of all document metadata dictionaries

Raises:

MemVidIndexError – If retrieval fails

get_document_count()[source]

Get the total number of documents in the index.

Return type:

int

Returns:

Number of documents in the index

get_document_id(frame_number)[source]

Get the document ID for a frame.

Parameters:

frame_number (int) – Frame number

Return type:

Optional[int]

Returns:

Document ID if found, None otherwise

get_frame_mapping_stats()[source]

Get statistics about frame mappings for monitoring and optimization.

Returns:

Statistics about frame mappings.

Return type:

FrameMappingStats

get_frame_number(doc_id)[source]

Get the frame number for a document.

Parameters:

doc_id (int) – Document ID

Return type:

Optional[int]

Returns:

Frame number if found, None otherwise

get_frames_to_delete(doc_ids)[source]

Get frame numbers that need to be deleted for given document IDs.

This method is a key component of the optimized deletion strategy, enabling precise frame-level deletion without full video rebuilds.

Optimization Strategy

  • Frame Mapping Lookup: Uses O(1) lookup to find frame numbers for document IDs

  • Safe Deletion Order: Returns frames in reverse order for safe deletion

  • Efficient Processing: Processes multiple document IDs in a single operation

  • Error Handling: Gracefully handles missing frame mappings

Performance Characteristics

  • Lookup Time: O(k) where k = number of document IDs

  • Memory Usage: Minimal temporary storage for frame numbers

  • Scalability: Efficient for large-scale deletions

  • Reliability: Handles missing mappings gracefully

Use Cases

  • Video Frame Removal: Provides frame numbers for selective video editing

  • Optimized Deletion: Enables efficient document removal without full rebuilds

  • Batch Processing: Supports deletion of multiple documents at once

  • Statistics: Provides data for deletion performance analysis

Parameters:

doc_ids (List[int]) – List of document IDs to delete

Return type:

List[int]

Returns:

List of frame numbers to delete, sorted in reverse order for safe deletion

Example

# Get frames to delete for multiple documents doc_ids = [0, 5, 10] frames_to_delete = index_manager.get_frames_to_delete(doc_ids) print(f”Frames to delete: {frames_to_delete}”) # e.g., [10, 5, 0]

# Use frames for video editing video_processor.remove_frames_from_video(video_path, frames_to_delete)

get_metadata(indices)[source]

Get metadata for given indices.

Return type:

List[Dict[str, Any]]

Parameters:

indices (List[int])

load(path=PosixPath('kb_index.d'))[source]

Load the index and metadata from disk.

Parameters:

path (Path) – Path to load the index from, defaults to LANGCHAIN_MEMVID_DEFAULT_INDEX_DIR

Raises:

MemVidIndexError – If loading fails

save(path=PosixPath('kb_index.d'))[source]

Save the index and metadata to disk.

Parameters:

path (Path) – Path to save the index, defaults to LANGCHAIN_MEMVID_DEFAULT_INDEX_DIR

Raises:

MemVidIndexError – If saving fails

search_text(query_text, k=4)[source]

Search for similar texts using a text query.

Return type:

List[SearchResult]

Parameters:
set_frame_mapping(doc_id, frame_number)[source]

Set the frame mapping for a document.

This method establishes the bidirectional mapping between document IDs and frame numbers, which is essential for optimized deletion strategies.

Frame Index Mapping

  • Bidirectional Mapping: doc_id frame_number for efficient lookups

  • O(1) Lookup: Enables constant-time frame number retrieval

  • Deletion Optimization: Allows precise frame-level deletion without full video rebuilds

  • Consistency: Maintains synchronization between FAISS index and video frames

Performance Benefits

  • Fast Deletion: O(k) time complexity where k = frames to delete

  • Memory Efficient: Minimal memory overhead for mapping storage

  • Scalable: Efficient for large document collections

  • Reliable: Provides fallback mechanisms when mappings are corrupted

Use Cases

  • Optimized Deletion: Enables selective frame removal from videos

  • Frame Lookup: Fast retrieval of frame numbers for document IDs

  • Document Lookup: Fast retrieval of document IDs for frame numbers

  • Statistics: Provides mapping coverage statistics for monitoring

Parameters:
  • doc_id (int) – Document ID

  • frame_number (int) – Frame number in the video

Example

# Set frame mapping for a document index_manager.set_frame_mapping(123, 5)

# Retrieve frame number frame_num = index_manager.get_frame_number(123) # Returns 5

# Retrieve document ID doc_id = index_manager.get_document_id(5) # Returns 123

class langchain_memvid.index.SearchResult(text, source=None, category=None, similarity=0.0, metadata=None)[source]

Bases: object

Represents a search result with metadata and similarity score.

Parameters:
category: Optional[str] = None
classmethod from_metadata(metadata, similarity)[source]

Create a SearchResult from metadata dictionary and similarity score.

Return type:

SearchResult

Parameters:
metadata: Optional[Dict[str, Any]] = None
similarity: float = 0.0
source: Optional[str] = None
text: str