Retriever

Retriever for MemVid vector store.

  • Performs semantic search using FAISS index and retrieves documents from video storage.

  • Supports both essential metadata (fast) and full metadata (from video QR codes).

  • Implements frame caching for efficient repeated access.

class langchain_memvid.retriever.Retriever(*args, **kwargs)[source]

Bases: BaseRetriever, BaseModel

Retriever for MemVid vector store.

  • Performs semantic search using FAISS index and retrieves documents from video storage.

  • Supports both essential metadata (fast) and full metadata (from video QR codes).

  • Implements frame caching for efficient repeated access.

Parameters:
_get_frame(frame_number)[source]

Get a specific frame from the video with caching.

Parameters:

frame_number (int) – Frame number to get

Return type:

Optional[Any]

Returns:

Frame if found, None otherwise

Raises:

RetrievalError – If frame retrieval fails

_get_full_metadata_from_video(doc_id)[source]

Get full metadata from video storage for a specific document.

This method implements the full metadata retrieval component of the hybrid storage approach:

Hybrid Storage Implementation

  • Video Decoding: Decodes specific video frames to extract complete metadata

  • Frame Mapping: Uses document-to-frame mapping for efficient frame location

  • Complete Data: Retrieves all metadata fields and custom attributes

  • Fallback Mechanism: Provides complete data access when FAISS data is insufficient

Performance Characteristics

  • Frame Lookup: O(1) lookup using frame mapping

  • Video Decoding: Additional processing time for frame decoding and QR code processing

  • Memory Usage: Medium (requires frame decoding and QR code processing)

Error Handling

  • Returns None if frame mapping is not available

  • Returns None if video decoding fails

  • Logs warnings for debugging purposes

  • Graceful degradation when video data is corrupted

Use Cases

  • Complete Metadata Access: When all metadata fields are required

  • Data Integrity Verification: When FAISS data needs validation

  • Backup Recovery: When FAISS index is corrupted or incomplete

  • Custom Field Access: When accessing fields not in essential metadata

Parameters:

doc_id (int) – Document ID

Return type:

Optional[Dict[str, Any]]

Returns:

Full metadata dictionary if found, None otherwise

_get_relevant_documents(query)[source]

Get documents relevant to the query.

This method implements the hybrid storage approach for optimal search performance:

Hybrid Storage Implementation

  • Essential Metadata Only: Returns documents with minimal metadata from FAISS

  • Fast Search: Leverages FAISS capabilities for sub-second search

  • Metadata Structure: Includes text, source, category, doc_id, metadata_hash

  • Metadata Type Flag: Sets “metadata_type”: “essential” for identification

Performance Optimizations

  • Progress Bar: Shows progress for large result sets (>10 documents)

  • Memory Efficient: Processes results in batches to avoid memory issues

  • Caching: Leverages frame caching for repeated access

Metadata Structure

  • source: Document source

  • category: Document category

  • similarity: Similarity score

  • doc_id: Document ID

  • metadata_hash: Metadata hash

  • metadata_type: Metadata type

  • … other essential fields

Parameters:

query (str) – Query string

Return type:

List[Document]

Returns:

List of relevant documents with essential metadata

Raises:

RetrievalError – If retrieval fails

async abatch(inputs, config=None, *, return_exceptions=False)[source]

Asynchronously invoke the retriever on multiple inputs.

Parameters:
  • inputs (List[str]) – List of query strings

  • config (Optional[RunnableConfig]) – Optional configuration for the run

  • return_exceptions (bool) – Whether to return exceptions instead of raising them

Return type:

List[List[Document]]

Returns:

List of document lists, one for each input

async ainvoke(input, config=None)[source]

Asynchronously invoke the retriever on a single input.

Parameters:
Return type:

List[Document]

Returns:

List of relevant documents

batch(inputs, config=None, *, return_exceptions=False)[source]

Invoke the retriever on multiple inputs.

Parameters:
  • inputs (List[str]) – List of query strings

  • config (Optional[RunnableConfig]) – Optional configuration for the run

  • return_exceptions (bool) – Whether to return exceptions instead of raising them

Return type:

List[List[Document]]

Returns:

List of document lists, one for each input

clear_cache()[source]

Clear the frame cache.

config: VectorStoreConfig
decode_all_frames()[source]

Decode all frames from the video.

Return type:

List[Document]

Returns:

List of all documents in the video

Raises:

RetrievalError – If decoding fails

decode_frame(frame_number)[source]

Decode a specific frame from the video.

Parameters:

frame_number (int) – Frame number to decode

Return type:

Optional[Document]

Returns:

Document if frame contains valid QR code, None otherwise

Raises:

RetrievalError – If decoding fails

frame_cache_size: int
get_document_by_id(doc_id, include_full_metadata=False)[source]

Get a document by its ID.

This method supports the hybrid storage approach with flexible metadata retrieval:

  • Essential Metadata Only (include_full_metadata=False): Fast retrieval from FAISS index

    • Document text, source, category, doc_id, metadata_hash

    • O(1) lookup time from FAISS

    • Minimal memory usage

    • Metadata type: “essential”

  • Full Metadata (include_full_metadata=True): Complete metadata from video storage

    • All metadata fields and custom attributes

    • Requires video frame decoding

    • Complete data access with integrity checking

    • Metadata type: “full”

Parameters:
  • doc_id (int) – Document ID

  • include_full_metadata (bool) – Whether to fetch full metadata from video

Return type:

Optional[Document]

Returns:

Document if found, None otherwise

Raises:

RetrievalError – If retrieval fails

get_documents_by_ids(doc_ids, include_full_metadata=False)[source]

Get documents by their IDs.

Parameters:
  • doc_ids (List[int]) – List of document IDs

  • include_full_metadata (bool) – Whether to fetch full metadata from video

Return type:

List[Document]

Returns:

List of documents

Raises:

RetrievalError – If retrieval fails

index_dir: Path
index_manager: Union[IndexManager, Any]
invoke(input, config=None)[source]

Invoke the retriever on a single input.

Parameters:
Return type:

List[Document]

Returns:

List of relevant documents

k: int
load_index: bool
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'from_attributes': True, 'protected_namespaces': (), 'strict': False, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_Retriever__context)[source]

Initialize additional attributes after Pydantic model initialization.

Parameters:

_Retriever__context (Any)

retrieve(query)[source]

Retrieve documents relevant to the query.

Parameters:

query (str) – Query string

Return type:

List[Document]

Returns:

List of relevant documents

Raises:

RetrievalError – If retrieval fails

video_file: Path
video_processor: Union[VideoProcessor, Any]