Vector Store

VectorStore implementation for LangChain MemVid.

This implementation stores documents as QR codes in video frames with semantic search capabilities using FAISS index.

The vector store implements a hybrid storage approach:

  • FAISS Index: Stores essential metadata (text, source, category, doc_id, metadata_hash) for fast search

  • Video Storage: Stores complete document data as QR codes with all metadata fields

Optimized deletion strategies avoid full video rebuilds by using frame index mapping for selective frame removal.

Usage Examples

Fast search with essential metadata:

results = vector_store.similarity_search(“query”, include_full_metadata=False)

Complete search with full metadata:

results = vector_store.similarity_search(“query”, include_full_metadata=True)

Optimized deletion:

vector_store.delete_by_ids([“0”, “5”, “10”])

Storage statistics:

stats = vector_store.get_storage_stats()

class langchain_memvid.vectorstore.VectorStore(*, embedding, video_file=PosixPath('kb_data.mp4'), index_dir=PosixPath('kb_index.d'), config=None)[source]

Bases: VectorStore

Vector store that stores documents in a video format using QR codes.

This vector store uses memvid to encode documents into QR codes and store them in a video file. It provides semantic search capabilities using FAISS index.

The vector store implements a hybrid storage approach:

  • FAISS Index: Stores essential metadata for fast search operations

  • Video Storage: Stores complete document data as QR codes with all metadata fields

Optimized deletion strategies use frame index mapping to avoid full video rebuilds.

Variables:
  • video_file (str) – Path to the video file storing QR codes, defaults to LANGCHAIN_MEMVID_DEFAULT_VIDEO_FILE

  • index_dir (str) – Path to the index directory for semantic search, defaults to LANGCHAIN_MEMVID_DEFAULT_INDEX_DIR

  • encoder (Encoder) – Encoder for converting documents to QR codes

  • _retriever (Optional[Retriever]) – Lazy-loaded retriever for searching and decoding QR codes

Parameters:
__init__(*, embedding, video_file=PosixPath('kb_data.mp4'), index_dir=PosixPath('kb_index.d'), config=None)[source]

Initialize VectorStore.

Parameters:
  • embedding (Embeddings) – Embedding model for semantic search

  • video_file (Path) – Path to store/load the video file, defaults to LANGCHAIN_MEMVID_DEFAULT_VIDEO_FILE

  • index_dir (Path) – Path to store/load the index file, defaults to LANGCHAIN_MEMVID_DEFAULT_INDEX_DIR

  • config (Optional[VectorStoreConfig]) – Optional unified configuration. If not provided, uses default configs

_access_retriever(k)[source]

Context manager for temporarily setting retriever’s k value.

This avoids creating copies of the retriever while maintaining thread safety through Python’s GIL, which ensures that the context manager’s enter and exit operations are atomic.

Parameters:

k (int) – The temporary k value to set

Return type:

Generator[Retriever, None, None]

_enhance_documents_with_full_metadata(docs)[source]

Enhance documents with full metadata from video storage.

Parameters:

docs (List[Document]) – List of documents with essential metadata

Return type:

List[Document]

Returns:

List of documents with full metadata

static _get_event_loop()[source]

Get the current event loop or create a new one if none exists.

Returns:

The current or new event loop

Return type:

asyncio.AbstractEventLoop

Note

This method handles both cases where: 1. We’re in an async context with a running event loop 2. We’re in a sync context and need to create a new event loop 3. We’re in a nested asyncio context (e.g., Jupyter notebook)

_optimized_delete_frames(doc_ids)[source]

Optimized deletion using frame removal instead of full rebuild.

This method removes specific frames from the video instead of rebuilding the entire video, which is much more efficient.

Parameters:

doc_ids (List[int]) – List of document IDs to delete

Return type:

bool

Returns:

True if deletion was successful, False otherwise

Raises:

RuntimeError – If frame deletion fails

_rebuild_video_after_deletion()[source]

Rebuild the video file with remaining documents after deletion.

This method rebuilds the video file using the remaining documents in the index manager.

Raises:

RuntimeError – If video rebuilding fails

async aadd_documents(documents, **kwargs)[source]

Add documents to the vector store asynchronously.

Parameters:
  • documents (List[Document]) – List of Document objects to add

  • **kwargs (Any) – Additional arguments (ignored)

Return type:

List[str]

Returns:

List of chunk IDs

async aadd_texts(texts, metadatas=None, **kwargs)[source]

Add texts to the vector store asynchronously.

Parameters:
  • texts (List[str]) – List of text strings to add

  • metadatas (Optional[List[Dict[str, Any]]]) – Optional list of metadata dicts for each text

  • **kwargs (Any) – Additional arguments (ignored)

Return type:

List[str]

Returns:

List of chunk IDs

Raises:
add_documents(documents, **kwargs)[source]

Add documents to the vector store.

Parameters:
  • documents (List[Document]) – List of Document objects to add

  • **kwargs (Any) – Additional arguments (ignored)

Return type:

List[str]

Returns:

List of chunk IDs

add_texts(texts, metadatas=None, **kwargs)[source]

Add texts to the vector store.

Parameters:
  • texts (List[str]) – List of text strings to add

  • metadatas (Optional[List[Dict[str, Any]]]) – Optional list of metadata dicts for each text

  • **kwargs (Any) – Additional arguments (ignored)

Return type:

List[str]

Returns:

List of chunk IDs

Raises:
async adelete_by_ids(doc_ids)[source]

Delete documents by their IDs asynchronously.

Parameters:

doc_ids (List[str]) – List of document IDs to delete

Return type:

bool

Returns:

True if any documents were deleted, False otherwise

Raises:
  • ValueError – If no document IDs are provided

  • RuntimeError – If video file doesn’t exist or deletion fails

async adelete_by_texts(texts)[source]

Delete documents by their text content asynchronously.

Parameters:

texts (List[str]) – List of text contents to delete

Return type:

bool

Returns:

True if any documents were deleted, False otherwise

Raises:
async adelete_documents(documents)[source]

Delete documents by Document objects asynchronously.

Parameters:

documents (List[Document]) – List of Document objects to delete

Return type:

bool

Returns:

True if any documents were deleted, False otherwise

Raises:
  • ValueError – If no documents are provided

  • RuntimeError – If video file doesn’t exist or deletion fails

async classmethod afrom_documents(documents, embedding, video_file=PosixPath('kb_data.mp4'), index_dir=PosixPath('kb_index.d'), **kwargs)[source]

Create vector store from documents asynchronously.

Parameters:
  • documents (List[Document]) – List of Document objects

  • embedding (Embeddings) – Embedding model

  • video_file (Path) – Path to store the video file, defaults to LANGCHAIN_MEMVID_DEFAULT_VIDEO_FILE

  • index_dir (Path) – Path to store the index file, defaults to LANGCHAIN_MEMVID_DEFAULT_INDEX_DIR

  • **kwargs (Any) – Additional arguments passed to constructor

Return type:

VectorStore

Returns:

VectorStore instance

async classmethod afrom_texts(texts, embedding, video_file=PosixPath('kb_data.mp4'), index_dir=PosixPath('kb_index.d'), metadatas=None, **kwargs)[source]

Create vector store from texts asynchronously.

Parameters:
  • texts (List[str]) – List of text strings

  • embedding (Embeddings) – Embedding model

  • video_file (Path) – Path to store the video file, defaults to LANGCHAIN_MEMVID_DEFAULT_VIDEO_FILE

  • index_dir (Path) – Path to store the index file, defaults to LANGCHAIN_MEMVID_DEFAULT_INDEX_DIR

  • metadatas (Optional[List[Dict[str, Any]]]) – Optional list of metadata dicts

  • **kwargs (Any) – Additional arguments passed to constructor

Return type:

VectorStore

Returns:

VectorStore instance

Search for similar documents asynchronously.

Parameters:
  • query (str) – Query text

  • k (int) – Number of results to return

  • **kwargs (Any) – Additional arguments (ignored)

Return type:

List[Document]

Returns:

List of Document objects

async asimilarity_search_with_score(query, k=4, **kwargs)[source]

Search for similar documents with scores asynchronously.

Parameters:
  • query (str) – Query text

  • k (int) – Number of results to return

  • **kwargs (Any) – Additional arguments (ignored)

Return type:

List[Tuple[Document, float]]

Returns:

List of (Document, score) tuples

delete_by_ids(doc_ids)[source]

Delete documents by their IDs.

Uses optimized deletion strategies with frame index mapping to avoid full video rebuilds. Falls back to full rebuild if optimized deletion fails.

Parameters:

doc_ids (List[str]) – List of document IDs to delete

Return type:

bool

Returns:

True if any documents were deleted, False otherwise

Raises:
  • ValueError – If no document IDs are provided

  • RuntimeError – If video file doesn’t exist or deletion fails

Example

vector_store.delete_by_ids([“0”, “5”, “10”])

delete_by_texts(texts)[source]

Delete documents by their text content.

Parameters:

texts (List[str]) – List of text contents to delete

Return type:

bool

Returns:

True if any documents were deleted, False otherwise

Raises:
delete_documents(documents)[source]

Delete documents by Document objects.

Parameters:

documents (List[Document]) – List of Document objects to delete

Return type:

bool

Returns:

True if any documents were deleted, False otherwise

Raises:
  • ValueError – If no documents are provided

  • RuntimeError – If video file doesn’t exist or deletion fails

classmethod from_documents(documents, embedding, video_file=PosixPath('kb_data.mp4'), index_dir=PosixPath('kb_index.d'), **kwargs)[source]

Create vector store from documents.

Parameters:
  • documents (List[Document]) – List of Document objects

  • embedding (Embeddings) – Embedding model

  • video_file (Path) – Path to store the video file, defaults to LANGCHAIN_MEMVID_DEFAULT_VIDEO_FILE

  • index_dir (Path) – Path to store the index file, defaults to LANGCHAIN_MEMVID_DEFAULT_INDEX_DIR

  • **kwargs (Any) – Additional arguments passed to constructor

Return type:

VectorStore

Returns:

VectorStore instance

classmethod from_texts(texts, embedding, video_file=PosixPath('kb_data.mp4'), index_dir=PosixPath('kb_index.d'), metadatas=None, **kwargs)[source]

Create vector store from texts.

Parameters:
  • texts (List[str]) – List of text strings

  • embedding (Embeddings) – Embedding model

  • video_file (Path) – Path to store the video file, defaults to LANGCHAIN_MEMVID_DEFAULT_VIDEO_FILE

  • index_dir (Path) – Path to store the index file, defaults to LANGCHAIN_MEMVID_DEFAULT_INDEX_DIR

  • metadatas (Optional[List[Dict[str, Any]]]) – Optional list of metadata dicts

  • **kwargs (Any) – Additional arguments passed to constructor

Return type:

VectorStore

Returns:

VectorStore instance

get_document_by_id(doc_id, include_full_metadata=False)[source]

Get a document by its ID.

Parameters:
  • doc_id (str) – Document ID as string

  • include_full_metadata (bool) – Whether to fetch full metadata from video

Return type:

Optional[Document]

Returns:

Document if found, None otherwise

Raises:

Example

# Fast retrieval with essential metadata doc = vector_store.get_document_by_id(“123”, include_full_metadata=False)

# Complete retrieval with full metadata doc_full = vector_store.get_document_by_id(“123”, include_full_metadata=True)

get_documents_by_ids(doc_ids, include_full_metadata=False)[source]

Get documents by their IDs.

Parameters:
  • doc_ids (List[str]) – List of document IDs as strings

  • include_full_metadata (bool) – Whether to fetch full metadata from video

Return type:

List[Document]

Returns:

List of Document objects

Raises:
get_storage_stats()[source]

Get storage statistics for the hybrid approach.

Returns:

Comprehensive storage statistics for the hybrid approach.

Return type:

StorageStats

Raises:

RuntimeError – If video file doesn’t exist

Example

stats = vector_store.get_storage_stats() print(f”Total documents: {stats.total_documents}”) print(f”Video file size: {stats.video_file_size_mb:.2f} MB”) print(f”Index size: {stats.index_size_mb:.2f} MB”) print(f”Redundancy percentage: {stats.redundancy_percentage:.1f}%”) print(f”Storage efficiency: {stats.storage_efficiency}”)

# Frame mapping statistics frame_stats = stats.frame_mapping_stats print(f”Mapped documents: {frame_stats.mapped_documents}”) print(f”Mapping coverage: {frame_stats.mapping_coverage:.1f}%”)

property retriever: Retriever

Get the retriever instance, creating it if necessary.

Returns:

The retriever instance

Return type:

Retriever

Raises:

RuntimeError – If video file doesn’t exist when retriever is needed

Search for similar documents.

Parameters:
  • query (str) – Query text

  • k (int) – Number of results to return

  • include_full_metadata (bool) – Whether to fetch full metadata from video

  • **kwargs (Any) – Additional arguments (ignored)

Return type:

List[Document]

Returns:

List of Document objects

Example

# Fast search with essential metadata results = vector_store.similarity_search(“query”, include_full_metadata=False)

# Complete search with full metadata results = vector_store.similarity_search(“query”, include_full_metadata=True)

similarity_search_with_score(query, k=4, include_full_metadata=False, **kwargs)[source]

Search for similar documents with scores.

Parameters:
  • query (str) – Query text

  • k (int) – Number of results to return

  • include_full_metadata (bool) – Whether to fetch full metadata from video

  • **kwargs (Any) – Additional arguments (ignored)

Return type:

List[Tuple[Document, float]]

Returns:

List of (Document, score) tuples