Retriever

Retriever for MemVid vector store.

Performs semantic search using FAISS index and retrieves documents from video storage.
Supports both essential metadata (fast) and full metadata (from video QR codes).
Implements frame caching for efficient repeated access.

class langchain_memvid.retriever.Retriever(*args, **kwargs)[source]

Bases: BaseRetriever, BaseModel

Retriever for MemVid vector store.

Performs semantic search using FAISS index and retrieves documents from video storage.
Supports both essential metadata (fast) and full metadata (from video QR codes).
Implements frame caching for efficient repeated access.

Parameters:

args (Any)
name (str | None)
tags (list[str] | None)
metadata (dict[str, Any] | None)
video_file (Path)
index_dir (Path)
config (VectorStoreConfig)
index_manager (IndexManager | Any)
video_processor (VideoProcessor | Any)
load_index (bool)
k (int)
frame_cache_size (int)

_get_frame(frame_number)[source]

Get a specific frame from the video with caching.

Parameters:: frame_number (int) – Frame number to get
Return type:: Optional[Any]
Returns:: Frame if found, None otherwise
Raises:: RetrievalError – If frame retrieval fails

_get_full_metadata_from_video(doc_id)[source]

Get full metadata from video storage for a specific document.

This method implements the full metadata retrieval component of the hybrid storage approach:

Hybrid Storage Implementation

Video Decoding: Decodes specific video frames to extract complete metadata
Frame Mapping: Uses document-to-frame mapping for efficient frame location
Complete Data: Retrieves all metadata fields and custom attributes
Fallback Mechanism: Provides complete data access when FAISS data is insufficient

Performance Characteristics

Frame Lookup: O(1) lookup using frame mapping
Video Decoding: Additional processing time for frame decoding and QR code processing
Memory Usage: Medium (requires frame decoding and QR code processing)

Error Handling

Returns None if frame mapping is not available
Returns None if video decoding fails
Logs warnings for debugging purposes
Graceful degradation when video data is corrupted

Use Cases

Complete Metadata Access: When all metadata fields are required
Data Integrity Verification: When FAISS data needs validation
Backup Recovery: When FAISS index is corrupted or incomplete
Custom Field Access: When accessing fields not in essential metadata

Parameters:: doc_id (int) – Document ID
Return type:: Optional[Dict[str, Any]]
Returns:: Full metadata dictionary if found, None otherwise

_get_relevant_documents(query)[source]

Get documents relevant to the query.

This method implements the hybrid storage approach for optimal search performance:

Hybrid Storage Implementation

Essential Metadata Only: Returns documents with minimal metadata from FAISS
Fast Search: Leverages FAISS capabilities for sub-second search
Metadata Structure: Includes text, source, category, doc_id, metadata_hash
Metadata Type Flag: Sets “metadata_type”: “essential” for identification

Performance Optimizations

Progress Bar: Shows progress for large result sets (>10 documents)
Memory Efficient: Processes results in batches to avoid memory issues
Caching: Leverages frame caching for repeated access

Metadata Structure

source: Document source
category: Document category
similarity: Similarity score
doc_id: Document ID
metadata_hash: Metadata hash
metadata_type: Metadata type
… other essential fields

Parameters:: query (str) – Query string
Return type:: List[Document]
Returns:: List of relevant documents with essential metadata
Raises:: RetrievalError – If retrieval fails

async abatch(inputs, config=None, *, return_exceptions=False)[source]

Asynchronously invoke the retriever on multiple inputs.

Parameters:

inputs (List[str]) – List of query strings
config (Optional[RunnableConfig]) – Optional configuration for the run
return_exceptions (bool) – Whether to return exceptions instead of raising them

Return type:

List[List[Document]]

Returns:

List of document lists, one for each input

async ainvoke(input, config=None)[source]

Asynchronously invoke the retriever on a single input.

Parameters:

input (str) – Query string
config (Optional[RunnableConfig]) – Optional configuration for the run

Return type:

List[Document]

Returns:

List of relevant documents

batch(inputs, config=None, *, return_exceptions=False)[source]

Invoke the retriever on multiple inputs.

Parameters:

inputs (List[str]) – List of query strings
config (Optional[RunnableConfig]) – Optional configuration for the run
return_exceptions (bool) – Whether to return exceptions instead of raising them

Return type:

List[List[Document]]

Returns:

List of document lists, one for each input

clear_cache()[source]: Clear the frame cache.

config: VectorStoreConfig

decode_all_frames()[source]

Decode all frames from the video.

Return type:: List[Document]
Returns:: List of all documents in the video
Raises:: RetrievalError – If decoding fails

decode_frame(frame_number)[source]

Decode a specific frame from the video.

Parameters:: frame_number (int) – Frame number to decode
Return type:: Optional[Document]
Returns:: Document if frame contains valid QR code, None otherwise
Raises:: RetrievalError – If decoding fails

frame_cache_size: int

get_document_by_id(doc_id, include_full_metadata=False)[source]

Get a document by its ID.

This method supports the hybrid storage approach with flexible metadata retrieval:

Essential Metadata Only (include_full_metadata=False): Fast retrieval from FAISS index
- Document text, source, category, doc_id, metadata_hash
- O(1) lookup time from FAISS
- Minimal memory usage
- Metadata type: “essential”
Full Metadata (include_full_metadata=True): Complete metadata from video storage
- All metadata fields and custom attributes
- Requires video frame decoding
- Complete data access with integrity checking
- Metadata type: “full”

Parameters:

doc_id (int) – Document ID
include_full_metadata (bool) – Whether to fetch full metadata from video

Return type:

Optional[Document]

Returns:

Document if found, None otherwise

Raises:

RetrievalError – If retrieval fails

get_documents_by_ids(doc_ids, include_full_metadata=False)[source]

Get documents by their IDs.

Parameters:

doc_ids (List[int]) – List of document IDs
include_full_metadata (bool) – Whether to fetch full metadata from video

Return type:

List[Document]

Returns:

List of documents

Raises:

RetrievalError – If retrieval fails

index_dir: Path

index_manager: Union[IndexManager, Any]

invoke(input, config=None)[source]

Invoke the retriever on a single input.

Parameters:

input (str) – Query string
config (Optional[RunnableConfig]) – Optional configuration for the run

Return type:

List[Document]

Returns:

List of relevant documents

k: int

load_index: bool

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'from_attributes': True, 'protected_namespaces': (), 'strict': False, 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_Retriever__context)[source]

Initialize additional attributes after Pydantic model initialization.

Parameters:: _Retriever__context (Any)

retrieve(query)[source]

Retrieve documents relevant to the query.

Parameters:: query (str) – Query string
Return type:: List[Document]
Returns:: List of relevant documents
Raises:: RetrievalError – If retrieval fails

video_file: Path

video_processor: Union[VideoProcessor, Any]