Vector Store
VectorStore implementation for LangChain MemVid.
This implementation stores documents as QR codes in video frames with semantic search capabilities using FAISS index.
The vector store implements a hybrid storage approach:
FAISS Index: Stores essential metadata (text, source, category, doc_id, metadata_hash) for fast search
Video Storage: Stores complete document data as QR codes with all metadata fields
Optimized deletion strategies avoid full video rebuilds by using frame index mapping for selective frame removal.
Usage Examples
- Fast search with essential metadata:
results = vector_store.similarity_search(“query”, include_full_metadata=False)
- Complete search with full metadata:
results = vector_store.similarity_search(“query”, include_full_metadata=True)
- Optimized deletion:
vector_store.delete_by_ids([“0”, “5”, “10”])
- Storage statistics:
stats = vector_store.get_storage_stats()
- class langchain_memvid.vectorstore.VectorStore(*, embedding, video_file=PosixPath('kb_data.mp4'), index_dir=PosixPath('kb_index.d'), config=None)[source]
Bases:
VectorStore
Vector store that stores documents in a video format using QR codes.
This vector store uses memvid to encode documents into QR codes and store them in a video file. It provides semantic search capabilities using FAISS index.
The vector store implements a hybrid storage approach:
FAISS Index: Stores essential metadata for fast search operations
Video Storage: Stores complete document data as QR codes with all metadata fields
Optimized deletion strategies use frame index mapping to avoid full video rebuilds.
- Variables:
video_file (
str
) – Path to the video file storing QR codes, defaults to LANGCHAIN_MEMVID_DEFAULT_VIDEO_FILEindex_dir (
str
) – Path to the index directory for semantic search, defaults to LANGCHAIN_MEMVID_DEFAULT_INDEX_DIRencoder (
Encoder
) – Encoder for converting documents to QR codes_retriever (
Optional[Retriever]
) – Lazy-loaded retriever for searching and decoding QR codes
- Parameters:
embedding (Embeddings)
video_file (Path)
index_dir (Path)
config (VectorStoreConfig | None)
- __init__(*, embedding, video_file=PosixPath('kb_data.mp4'), index_dir=PosixPath('kb_index.d'), config=None)[source]
Initialize VectorStore.
- Parameters:
embedding (
Embeddings
) – Embedding model for semantic searchvideo_file (
Path
) – Path to store/load the video file, defaults to LANGCHAIN_MEMVID_DEFAULT_VIDEO_FILEindex_dir (
Path
) – Path to store/load the index file, defaults to LANGCHAIN_MEMVID_DEFAULT_INDEX_DIRconfig (
Optional
[VectorStoreConfig
]) – Optional unified configuration. If not provided, uses default configs
- _access_retriever(k)[source]
Context manager for temporarily setting retriever’s k value.
This avoids creating copies of the retriever while maintaining thread safety through Python’s GIL, which ensures that the context manager’s enter and exit operations are atomic.
- _enhance_documents_with_full_metadata(docs)[source]
Enhance documents with full metadata from video storage.
- static _get_event_loop()[source]
Get the current event loop or create a new one if none exists.
- Returns:
The current or new event loop
- Return type:
Note
This method handles both cases where: 1. We’re in an async context with a running event loop 2. We’re in a sync context and need to create a new event loop 3. We’re in a nested asyncio context (e.g., Jupyter notebook)
- _optimized_delete_frames(doc_ids)[source]
Optimized deletion using frame removal instead of full rebuild.
This method removes specific frames from the video instead of rebuilding the entire video, which is much more efficient.
- Parameters:
- Return type:
- Returns:
True if deletion was successful, False otherwise
- Raises:
RuntimeError – If frame deletion fails
- _rebuild_video_after_deletion()[source]
Rebuild the video file with remaining documents after deletion.
This method rebuilds the video file using the remaining documents in the index manager.
- Raises:
RuntimeError – If video rebuilding fails
- async aadd_documents(documents, **kwargs)[source]
Add documents to the vector store asynchronously.
- async aadd_texts(texts, metadatas=None, **kwargs)[source]
Add texts to the vector store asynchronously.
- Parameters:
- Return type:
- Returns:
List of chunk IDs
- Raises:
ValueError – If no texts are provided
RuntimeError – If video building fails
- add_texts(texts, metadatas=None, **kwargs)[source]
Add texts to the vector store.
- Parameters:
- Return type:
- Returns:
List of chunk IDs
- Raises:
ValueError – If no texts are provided
RuntimeError – If video building fails
- async adelete_by_ids(doc_ids)[source]
Delete documents by their IDs asynchronously.
- Parameters:
- Return type:
- Returns:
True if any documents were deleted, False otherwise
- Raises:
ValueError – If no document IDs are provided
RuntimeError – If video file doesn’t exist or deletion fails
- async adelete_by_texts(texts)[source]
Delete documents by their text content asynchronously.
- Parameters:
- Return type:
- Returns:
True if any documents were deleted, False otherwise
- Raises:
ValueError – If no texts are provided
RuntimeError – If video file doesn’t exist or deletion fails
- async adelete_documents(documents)[source]
Delete documents by Document objects asynchronously.
- Parameters:
documents (
List
[Document
]) – List of Document objects to delete- Return type:
- Returns:
True if any documents were deleted, False otherwise
- Raises:
ValueError – If no documents are provided
RuntimeError – If video file doesn’t exist or deletion fails
- async classmethod afrom_documents(documents, embedding, video_file=PosixPath('kb_data.mp4'), index_dir=PosixPath('kb_index.d'), **kwargs)[source]
Create vector store from documents asynchronously.
- Parameters:
documents (
List
[Document
]) – List of Document objectsembedding (
Embeddings
) – Embedding modelvideo_file (
Path
) – Path to store the video file, defaults to LANGCHAIN_MEMVID_DEFAULT_VIDEO_FILEindex_dir (
Path
) – Path to store the index file, defaults to LANGCHAIN_MEMVID_DEFAULT_INDEX_DIR**kwargs (
Any
) – Additional arguments passed to constructor
- Return type:
- Returns:
VectorStore instance
- async classmethod afrom_texts(texts, embedding, video_file=PosixPath('kb_data.mp4'), index_dir=PosixPath('kb_index.d'), metadatas=None, **kwargs)[source]
Create vector store from texts asynchronously.
- Parameters:
embedding (
Embeddings
) – Embedding modelvideo_file (
Path
) – Path to store the video file, defaults to LANGCHAIN_MEMVID_DEFAULT_VIDEO_FILEindex_dir (
Path
) – Path to store the index file, defaults to LANGCHAIN_MEMVID_DEFAULT_INDEX_DIRmetadatas (
Optional
[List
[Dict
[str
,Any
]]]) – Optional list of metadata dicts**kwargs (
Any
) – Additional arguments passed to constructor
- Return type:
- Returns:
VectorStore instance
- async asimilarity_search(query, k=4, **kwargs)[source]
Search for similar documents asynchronously.
- async asimilarity_search_with_score(query, k=4, **kwargs)[source]
Search for similar documents with scores asynchronously.
- delete_by_ids(doc_ids)[source]
Delete documents by their IDs.
Uses optimized deletion strategies with frame index mapping to avoid full video rebuilds. Falls back to full rebuild if optimized deletion fails.
- Parameters:
- Return type:
- Returns:
True if any documents were deleted, False otherwise
- Raises:
ValueError – If no document IDs are provided
RuntimeError – If video file doesn’t exist or deletion fails
Example
vector_store.delete_by_ids([“0”, “5”, “10”])
- delete_by_texts(texts)[source]
Delete documents by their text content.
- Parameters:
- Return type:
- Returns:
True if any documents were deleted, False otherwise
- Raises:
ValueError – If no texts are provided
RuntimeError – If video file doesn’t exist or deletion fails
- delete_documents(documents)[source]
Delete documents by Document objects.
- Parameters:
documents (
List
[Document
]) – List of Document objects to delete- Return type:
- Returns:
True if any documents were deleted, False otherwise
- Raises:
ValueError – If no documents are provided
RuntimeError – If video file doesn’t exist or deletion fails
- classmethod from_documents(documents, embedding, video_file=PosixPath('kb_data.mp4'), index_dir=PosixPath('kb_index.d'), **kwargs)[source]
Create vector store from documents.
- Parameters:
documents (
List
[Document
]) – List of Document objectsembedding (
Embeddings
) – Embedding modelvideo_file (
Path
) – Path to store the video file, defaults to LANGCHAIN_MEMVID_DEFAULT_VIDEO_FILEindex_dir (
Path
) – Path to store the index file, defaults to LANGCHAIN_MEMVID_DEFAULT_INDEX_DIR**kwargs (
Any
) – Additional arguments passed to constructor
- Return type:
- Returns:
VectorStore instance
- classmethod from_texts(texts, embedding, video_file=PosixPath('kb_data.mp4'), index_dir=PosixPath('kb_index.d'), metadatas=None, **kwargs)[source]
Create vector store from texts.
- Parameters:
embedding (
Embeddings
) – Embedding modelvideo_file (
Path
) – Path to store the video file, defaults to LANGCHAIN_MEMVID_DEFAULT_VIDEO_FILEindex_dir (
Path
) – Path to store the index file, defaults to LANGCHAIN_MEMVID_DEFAULT_INDEX_DIRmetadatas (
Optional
[List
[Dict
[str
,Any
]]]) – Optional list of metadata dicts**kwargs (
Any
) – Additional arguments passed to constructor
- Return type:
- Returns:
VectorStore instance
- get_document_by_id(doc_id, include_full_metadata=False)[source]
Get a document by its ID.
- Parameters:
- Return type:
Optional
[Document
]- Returns:
Document if found, None otherwise
- Raises:
ValueError – If document ID format is invalid
RuntimeError – If video file doesn’t exist
Example
# Fast retrieval with essential metadata doc = vector_store.get_document_by_id(“123”, include_full_metadata=False)
# Complete retrieval with full metadata doc_full = vector_store.get_document_by_id(“123”, include_full_metadata=True)
- get_documents_by_ids(doc_ids, include_full_metadata=False)[source]
Get documents by their IDs.
- Parameters:
- Return type:
List
[Document
]- Returns:
List of Document objects
- Raises:
ValueError – If document ID format is invalid
RuntimeError – If video file doesn’t exist
- get_storage_stats()[source]
Get storage statistics for the hybrid approach.
- Returns:
Comprehensive storage statistics for the hybrid approach.
- Return type:
StorageStats
- Raises:
RuntimeError – If video file doesn’t exist
Example
stats = vector_store.get_storage_stats() print(f”Total documents: {stats.total_documents}”) print(f”Video file size: {stats.video_file_size_mb:.2f} MB”) print(f”Index size: {stats.index_size_mb:.2f} MB”) print(f”Redundancy percentage: {stats.redundancy_percentage:.1f}%”) print(f”Storage efficiency: {stats.storage_efficiency}”)
# Frame mapping statistics frame_stats = stats.frame_mapping_stats print(f”Mapped documents: {frame_stats.mapped_documents}”) print(f”Mapping coverage: {frame_stats.mapping_coverage:.1f}%”)
- property retriever: Retriever
Get the retriever instance, creating it if necessary.
- Returns:
The retriever instance
- Return type:
- Raises:
RuntimeError – If video file doesn’t exist when retriever is needed
- similarity_search(query, k=4, include_full_metadata=False, **kwargs)[source]
Search for similar documents.
- Parameters:
- Return type:
List
[Document
]- Returns:
List of Document objects
Example
# Fast search with essential metadata results = vector_store.similarity_search(“query”, include_full_metadata=False)
# Complete search with full metadata results = vector_store.similarity_search(“query”, include_full_metadata=True)