# LangChain MemVid Quick Start Guide

This notebook demonstrates the basic usage of the LangChain MemVid library, which allows you to store and retrieve text data using video files as a storage medium.

In [1]:
%pip install -e ..
%load_ext ipykernel_memvid_extension
%restart_kernel -f
%mute

<IPython.core.display.Javascript object>

## Setup and Imports

First, we'll install the required dependencies and import the necessary modules. The main components we need are:
- `langchain-huggingface` for embeddings
- `sentence-transformers` for the underlying embedding model
- `VectorStore` from langchain_memvid for our main functionality

In [2]:
%pip_install langchain-huggingface sentence-transformers

from langchain_huggingface import HuggingFaceEmbeddings
from pathlib import Path

from langchain_memvid import VectorStore

Package,Version
langchain-huggingface,0.3.0
sentence-transformers,4.1.0


## Creating a Vector Store

Now we'll create a vector store with some example data. We'll:
1. Define paths for storing the video and index files
2. Initialize the embedding model
3. Create sample text data with metadata
4. Build the vector store from our texts

Note: The metadata helps organize and filter our data, associating each text with a source, category, and ID.

In [3]:
# Paths to store the video and index files
knowledge_base_file = Path("knowledge_base.mp4")
knowledge_base_index_dir = Path("knowledge_base_index.d")

# Embedding model
embedding = HuggingFaceEmbeddings()

# Example text chunks
texts = [
    "The quick brown fox jumps over the lazy dog",
    "A fast orange fox leaps across a sleepy canine",
    "The weather is beautiful today",
    "It's raining cats and dogs outside",
    "Python is a popular programming language",
    "JavaScript is widely used for web development"
]

# Example metadata for each text
metadata = [
    {"id": 0, "source": "example1.txt", "category": "animals"},
    {"id": 1, "source": "example1.txt", "category": "animals"},
    {"id": 2, "source": "example2.txt", "category": "weather"},
    {"id": 3, "source": "example2.txt", "category": "weather"},
    {"id": 4, "source": "example3.txt", "category": "programming"},
    {"id": 5, "source": "example3.txt", "category": "programming"}
]

# Create vector store
vs = VectorStore.from_texts(
    texts=texts,
    embedding=embedding,
    video_file=knowledge_base_file,
    index_dir=knowledge_base_index_dir,
    metadatas=metadata,
)

[2025-06-20 20:45:18] INFO [langchain_memvid.encoder.add_chunks:85] Added 6 chunks for encoding
[2025-06-20 20:45:18] INFO [langchain_memvid.index.create_index:166] Created faiss index with cosine metric
[2025-06-20 20:45:18] INFO [langchain_memvid.index.add_texts:182] Embedding 6 texts...


Embedding texts: 100%|██████████| 1/1 [00:00<00:00, 84.86it/s]
Deduplicating texts: 100%|██████████| 6/6 [00:00<00:00, 67108.86it/s]
Adding vectors to index: 100%|██████████| 1/1 [00:00<00:00, 13315.25it/s]

[2025-06-20 20:45:18] INFO [langchain_memvid.index.add_texts:298] Added 6 unique texts to index





[2025-06-20 20:45:19] INFO [langchain_memvid.video.default.encode_video:218] Encoding 6 frames to video...


Preparing frames: 100%|██████████| 6/6 [00:00<00:00, 106.22it/s]
Writing video: 100%|██████████| 6/6 [00:00<00:00, 82.48it/s]

[2025-06-20 20:45:19] INFO [langchain_memvid.video.default.encode_video:239] Video encoded successfully to /home/dawid/github/sarumaj/langchain-memvid/examples/knowledge_base.mp4
[2025-06-20 20:45:19] INFO [langchain_memvid.index.save:620] Saved index to /home/dawid/github/sarumaj/langchain-memvid/examples/knowledge_base_index.d
[2025-06-20 20:45:19] INFO [langchain_memvid.encoder.build_video:214] Built video with 6 chunks in 0.98s
[2025-06-20 20:45:19] INFO [langchain_memvid.index.load:655] Loaded index from /home/dawid/github/sarumaj/langchain-memvid/examples/knowledge_base_index.d
[2025-06-20 20:45:19] INFO [langchain_memvid.vectorstore.add_texts:213] Built video with 6 chunks (1.84 MB)





## Performing Similarity Searches

Let's test our vector store by performing similarity searches. We'll try different queries to see how well the system retrieves relevant information. The search will return the most similar texts along with their metadata and similarity scores.

In [4]:
# Example searches
queries = [
    "Tell me about foxes",
    "What's the weather like?",
    "What programming languages are mentioned?"
]

results = [
    {
        "query": query,
        "content": doc.page_content,
        **{k: v for k, v in doc.metadata.items() if k != "text" and v is not None}
    }
    for query in queries
    for doc in vs.similarity_search(query, k=2, include_full_metadata=True)
]

%as_table results

[2025-06-20 20:45:19] INFO [langchain_memvid.retriever.model_post_init:72] Initialized retriever with video: /home/dawid/github/sarumaj/langchain-memvid/examples/knowledge_base.mp4
[2025-06-20 20:45:19] INFO [langchain_memvid.video.default.decode_video:270] Decoding 6 frames from video...


Decoding video: 100%|██████████| 6/6 [00:00<00:00, 92.48it/s]


[2025-06-20 20:45:19] INFO [langchain_memvid.video.default.decode_video:270] Decoding 6 frames from video...


Decoding video: 100%|██████████| 6/6 [00:00<00:00, 111.00it/s]


[2025-06-20 20:45:19] INFO [langchain_memvid.video.default.decode_video:270] Decoding 6 frames from video...


Decoding video: 100%|██████████| 6/6 [00:00<00:00, 116.34it/s]


[2025-06-20 20:45:19] INFO [langchain_memvid.video.default.decode_video:270] Decoding 6 frames from video...


Decoding video: 100%|██████████| 6/6 [00:00<00:00, 136.70it/s]


[2025-06-20 20:45:19] INFO [langchain_memvid.video.default.decode_video:270] Decoding 6 frames from video...


Decoding video: 100%|██████████| 6/6 [00:00<00:00, 129.58it/s]


[2025-06-20 20:45:19] INFO [langchain_memvid.video.default.decode_video:270] Decoding 6 frames from video...


Decoding video: 100%|██████████| 6/6 [00:00<00:00, 167.49it/s]


Query,Content,Source,Category,Similarity,Doc Id,Metadata Hash,Metadata Type,Id
Tell me about foxes,The quick brown fox jumps over the lazy dog,example1.txt,animals,0.5380151271820068,0,e736fb1873e579b94b33447941a28a2b07b1c5eb109bdb93f2d0bfc29eee43e7,full,0
Tell me about foxes,A fast orange fox leaps across a sleepy canine,example1.txt,animals,0.5364233255386353,1,525e29705fa2d417eb0dd5ab186fcdd1d9d9f2cd0dde16484c6ae14f08b04d4b,full,1
What's the weather like?,The weather is beautiful today,example2.txt,weather,0.4702893495559692,2,fd5c222e2c825c849b761f2d25ff01e5ca10439097e010f22c54e2d86367467c,full,2
What's the weather like?,It's raining cats and dogs outside,example2.txt,weather,0.2783700823783874,3,c137730ab9359e5e77d7bcfcbb95174f7f9ff870533d312553735aa9756a8a39,full,3
What programming languages are mentioned?,Python is a popular programming language,example3.txt,programming,0.5954955816268921,4,be883de17e93c5b9bb0a08bdd6aa44f6cb414fafc77d19b2dada9b8f9fdd4739,full,4
What programming languages are mentioned?,JavaScript is widely used for web development,example3.txt,programming,0.4239958524703979,5,e09923b88b7e20f2ae8feffd485250953e77f59c79afbe40cdf471cf4c80ae74,full,5


# Removing content

Let's us remove some of the documents and re-run the simiarity search.

In [5]:
# Remove every second document
vs.delete_by_texts(texts[::2])

# Re-run the similarity search
results = [
    {
        "query": query,
        "content": doc.page_content,
        **{k: v for k, v in doc.metadata.items() if k != "text" and v is not None}
    }
    for query in queries
    for doc in vs.similarity_search(query, k=2, include_full_metadata=True)
]

%as_table results


[2025-06-20 20:45:20] INFO [langchain_memvid.index.create_index:166] Created faiss index with cosine metric
[2025-06-20 20:45:20] INFO [langchain_memvid.index._rebuild_index_without_deleted:431] Rebuilt index with 3 remaining documents
[2025-06-20 20:45:20] INFO [langchain_memvid.index.delete_by_ids:366] Deleted 3 documents from index
[2025-06-20 20:45:20] INFO [langchain_memvid.encoder.clear:223] Cleared all chunks
[2025-06-20 20:45:20] INFO [langchain_memvid.encoder.add_chunks:85] Added 3 chunks for encoding
[2025-06-20 20:45:20] INFO [langchain_memvid.index.create_index:166] Created faiss index with cosine metric
[2025-06-20 20:45:20] INFO [langchain_memvid.index.add_texts:182] Embedding 3 texts...


Embedding texts: 100%|██████████| 1/1 [00:00<00:00, 90.75it/s]
Deduplicating texts: 100%|██████████| 3/3 [00:00<00:00, 68759.08it/s]
Adding vectors to index: 100%|██████████| 1/1 [00:00<00:00, 19878.22it/s]

[2025-06-20 20:45:20] INFO [langchain_memvid.index.add_texts:298] Added 3 unique texts to index





[2025-06-20 20:45:20] INFO [langchain_memvid.video.default.encode_video:218] Encoding 3 frames to video...


Preparing frames: 100%|██████████| 3/3 [00:00<00:00, 86.60it/s]
Writing video: 100%|██████████| 3/3 [00:00<00:00, 71.33it/s]

[2025-06-20 20:45:20] INFO [langchain_memvid.video.default.encode_video:239] Video encoded successfully to /home/dawid/github/sarumaj/langchain-memvid/examples/knowledge_base.mp4
[2025-06-20 20:45:20] INFO [langchain_memvid.index.save:620] Saved index to /home/dawid/github/sarumaj/langchain-memvid/examples/knowledge_base_index.d
[2025-06-20 20:45:20] INFO [langchain_memvid.encoder.build_video:214] Built video with 3 chunks in 0.40s
[2025-06-20 20:45:20] INFO [langchain_memvid.index.load:655] Loaded index from /home/dawid/github/sarumaj/langchain-memvid/examples/knowledge_base_index.d
[2025-06-20 20:45:20] INFO [langchain_memvid.vectorstore._rebuild_video_after_deletion:528] Rebuilt video with 3 remaining chunks (1.16 MB)
[2025-06-20 20:45:20] INFO [langchain_memvid.vectorstore.delete_by_texts:385] Deleted documents with specified texts and rebuilt video
[2025-06-20 20:45:20] INFO [langchain_memvid.retriever.model_post_init:72] Initialized retriever with video: /home/dawid/github/saruma


Decoding video: 100%|██████████| 3/3 [00:00<00:00, 73.21it/s]


[2025-06-20 20:45:20] INFO [langchain_memvid.video.default.decode_video:270] Decoding 3 frames from video...


Decoding video: 100%|██████████| 3/3 [00:00<00:00, 97.99it/s]


[2025-06-20 20:45:21] INFO [langchain_memvid.video.default.decode_video:270] Decoding 3 frames from video...


Decoding video: 100%|██████████| 3/3 [00:00<00:00, 120.98it/s]


Query,Content,Similarity,Doc Id,Metadata Hash,Metadata Type,Id
Tell me about foxes,A fast orange fox leaps across a sleepy canine,0.5364232063293457,0,4e4f59cc945827d6c948bc8b3c3444310ccfd75b6327320398da67679e8fa377,full,0
Tell me about foxes,It's raining cats and dogs outside,0.1692301332950592,1,0abb9bc36ab7f2333c3beeb4d31ded101372985822d86889675582fdf4f2146c,full,1
What's the weather like?,It's raining cats and dogs outside,0.2783701419830322,1,0abb9bc36ab7f2333c3beeb4d31ded101372985822d86889675582fdf4f2146c,full,1
What's the weather like?,JavaScript is widely used for web development,0.0163713507354259,2,344481724d49bdeabfcafc11cdbbbe09c2e618f9f483793927ab2dbcb8607edc,full,2
What programming languages are mentioned?,JavaScript is widely used for web development,0.4239957928657532,2,344481724d49bdeabfcafc11cdbbbe09c2e618f9f483793927ab2dbcb8607edc,full,2
What programming languages are mentioned?,It's raining cats and dogs outside,0.0816741436719894,1,0abb9bc36ab7f2333c3beeb4d31ded101372985822d86889675582fdf4f2146c,full,1


## Cleanup

Finally, we'll clean up our temporary files (video and index) to free up disk space.

In [6]:
%cleanup -f
%dump -f -r :-2

Name,Type,Object
knowledge_base_file,PosixPath,knowledge_base.mp4
knowledge_base_index_dir,PosixPath,knowledge_base_index.d
