# LangChain MemVid Advanced Usage Guide

This notebook demonstrates the advanced features and components of the LangChain MemVid library, showing how to work with individual components for more fine-grained control.

In [1]:
%pip install -e ..
%load_ext ipykernel_memvid_extension
%restart_kernel -f
%mute

<IPython.core.display.Javascript object>

## Setup and Component Imports

We'll import the core components that allow us to work with the system at a lower level:
- `Encoder` for converting text to video
- `IndexConfig` and `IndexManager` for managing the vector index
- `QRCodeConfig` and `VideoConfig` for customizing the storage format
- `VideoProcessor` for direct video manipulation
- `Retriever` for searching stored data

In [2]:
%pip_install langchain-huggingface sentence-transformers

from langchain_huggingface import HuggingFaceEmbeddings
from pathlib import Path

from langchain_memvid import (
    Encoder,
    IndexConfig,
    IndexManager,
    QRCodeConfig,
    VectorStoreConfig,
    VideoConfig,
    Retriever
)
from langchain_memvid.video import VideoProcessor

Package,Version
langchain-huggingface,0.3.0
sentence-transformers,4.1.0


## Setting Up the Index

First, we'll create and configure the vector index:
1. Create an index configuration with FAISS backend
2. Initialize the embedding model
3. Set up the index manager
4. Add sample texts with metadata to the index

In [3]:
# Create index configuration
config = IndexConfig(
    index_type="faiss",
    metric="cosine",
    nlist=6  # Number of clusters for IVF index
)

# Initialize Embeddings
embeddings = HuggingFaceEmbeddings()

# Create index manager
index_manager = IndexManager(config=config, embeddings=embeddings)

# Example text chunks
texts = [
    "The quick brown fox jumps over the lazy dog",
    "A fast orange fox leaps across a sleepy canine",
    "The weather is beautiful today",
    "It's raining cats and dogs outside",
    "Python is a popular programming language",
    "JavaScript is widely used for web development"
]

# Example metadata for each text
metadata = [
    {"id": 0, "source": "example1.txt", "category": "animals"},
    {"id": 1, "source": "example1.txt", "category": "animals"},
    {"id": 2, "source": "example2.txt", "category": "weather"},
    {"id": 3, "source": "example2.txt", "category": "weather"},
    {"id": 4, "source": "example3.txt", "category": "programming"},
    {"id": 5, "source": "example3.txt", "category": "programming"}
]

# Add texts with metadata
# The index will be created automatically with the correct dimension
# and trained if using an IVF index
index_manager.add_texts(texts, metadata)

[2025-06-20 20:56:18] INFO [langchain_memvid.index.create_index:166] Created faiss index with cosine metric
[2025-06-20 20:56:18] INFO [langchain_memvid.index.add_texts:182] Embedding 6 texts...


Embedding texts: 100%|██████████| 1/1 [00:00<00:00, 88.10it/s]
Deduplicating texts: 100%|██████████| 6/6 [00:00<00:00, 165564.63it/s]
Adding vectors to index: 100%|██████████| 1/1 [00:00<00:00, 15534.46it/s]

[2025-06-20 20:56:18] INFO [langchain_memvid.index.add_texts:298] Added 6 unique texts to index





## Testing Initial Search Functionality

Let's verify our index is working by performing some test searches. This demonstrates the basic search functionality before we encode the data into video format.

In [4]:
# Example searches
queries = [
    "Tell me about foxes",
    "What's the weather like?",
    "What programming languages are mentioned?"
]

results = [
    {
        "query": query,
        "text": result.text,
        "source": result.source,
        "category": result.category,
        "similarity": f"{result.similarity:.4f}"
    }
    for query in queries
    for result in index_manager.search_text(query, k=2)
]

%as_table results

Query,Text,Source,Category,Similarity
Tell me about foxes,The quick brown fox jumps over the lazy dog,example1.txt,animals,0.538
Tell me about foxes,A fast orange fox leaps across a sleepy canine,example1.txt,animals,0.5364
What's the weather like?,The weather is beautiful today,example2.txt,weather,0.4703
What's the weather like?,It's raining cats and dogs outside,example2.txt,weather,0.2784
What programming languages are mentioned?,Python is a popular programming language,example3.txt,programming,0.5955
What programming languages are mentioned?,JavaScript is widely used for web development,example3.txt,programming,0.424


## Video Processing Setup

Now we'll set up the video processing components:
1. Configure video parameters (resolution, FPS, codec)
2. Configure QR code generation parameters
3. Create a test video with our data
4. Verify we can decode the data back from the video

In [5]:
video_config = VideoConfig(
    fps=30,
    resolution=(1920, 1080),
    codec="mp4v",
)

qrcode_config = QRCodeConfig(
    error_correction="H",
    box_size=10,
    border=4
)

video_processor = VideoProcessor(
    video_config=video_config,
    qrcode_config=qrcode_config
)

# Create a test video
data = [
    "The quick brown fox jumps over the lazy dog",
    "A fast orange fox leaps across a sleepy canine",
    "The weather is beautiful today",
    "It's raining cats and dogs outside",
    "Python is a popular programming language",
    "JavaScript is widely used for web development"
]
images = [video_processor.create_qr_code(d) for d in data]
output_path = Path("test_video.mp4")

# Encode the image into a video
video_processor.encode_video(
    frames=images,
    output_path=output_path
)

frames = video_processor.decode_video(Path("test_video.mp4"))

decoded_data = []
for frame in frames:
    decoded_data.extend(video_processor.extract_qr_codes(frame))

%as_bullet_list decoded_data

[2025-06-20 20:56:19] INFO [langchain_memvid.video.default.encode_video:218] Encoding 6 frames to video...


Preparing frames: 100%|██████████| 6/6 [00:00<00:00, 101.38it/s]
Writing video: 100%|██████████| 6/6 [00:00<00:00, 74.67it/s]

[2025-06-20 20:56:19] INFO [langchain_memvid.video.default.encode_video:239] Video encoded successfully to test_video.mp4
[2025-06-20 20:56:19] INFO [langchain_memvid.video.default.decode_video:270] Decoding 6 frames from video...



Decoding video: 100%|██████████| 6/6 [00:00<00:00,  7.34it/s]


## Building the Complete System

Here we combine all components to create a complete vector store system:
1. Configure the vector store settings
2. Create an encoder with our configurations
3. Build the video file and index
4. Display statistics about the created storage

In [6]:
cfg = VectorStoreConfig(
    video=video_config,
    qrcode=qrcode_config
)

encoder = Encoder(config=cfg, index_manager=index_manager)
encoder.add_chunks(texts, metadata)

video_file = Path("test_video.mp4")
index_dir = Path("test_index.d")

stats = encoder.build_video(video_file, index_dir)

%as_table stats

[2025-06-20 20:56:20] INFO [langchain_memvid.encoder.add_chunks:85] Added 6 chunks for encoding
[2025-06-20 20:56:20] INFO [langchain_memvid.index.add_texts:182] Embedding 6 texts...


Embedding texts: 100%|██████████| 1/1 [00:00<00:00, 75.04it/s]
Deduplicating texts: 100%|██████████| 6/6 [00:00<00:00, 75573.05it/s]

[2025-06-20 20:56:20] INFO [langchain_memvid.index.add_texts:207] No new texts to add - all were duplicates





[2025-06-20 20:56:21] INFO [langchain_memvid.video.default.encode_video:218] Encoding 6 frames to video...


Preparing frames: 100%|██████████| 6/6 [00:00<00:00, 127.63it/s]
Writing video: 100%|██████████| 6/6 [00:00<00:00, 99.70it/s]

[2025-06-20 20:56:21] INFO [langchain_memvid.video.default.encode_video:239] Video encoded successfully to test_video.mp4
[2025-06-20 20:56:21] INFO [langchain_memvid.index.save:620] Saved index to test_index.d
[2025-06-20 20:56:21] INFO [langchain_memvid.encoder.build_video:214] Built video with 6 chunks in 0.67s





Name,Value
Total Chunks,6
Video Size Mb,1.240159034729004
Encoding Time,0.6741266250610352
Index Path,test_index.d
Video Path,test_video.mp4


## Testing the Complete System

Finally, we'll test the complete system by:
1. Creating a retriever that can access both the video and index
2. Performing similarity searches
3. Verifying that results match our original queries

In [7]:
retriever = Retriever(
    video_file=video_file,
    index_dir=index_dir,
    config=cfg,
    index_manager=index_manager,
    k=2,
)

# Example searches
queries = [
    "Tell me about foxes",
    "What's the weather like?",
    "What programming languages are mentioned?"
]

results = [
    {
        "query": query,
        "text": doc.page_content,
        **{k: v for k, v in doc.metadata.items() if k != "text"}
    }
    for query in queries
    for doc in retriever.retrieve(query)
]

%as_table results

[2025-06-20 20:56:21] INFO [langchain_memvid.index.load:655] Loaded index from test_index.d
[2025-06-20 20:56:21] INFO [langchain_memvid.retriever.model_post_init:72] Initialized retriever with video: test_video.mp4


Query,Text,Source,Category,Similarity,Doc Id,Metadata Hash,Metadata Type,Id
Tell me about foxes,The quick brown fox jumps over the lazy dog,example1.txt,animals,0.5380151271820068,0,a66db152af6005d192326ca658e456b18347ca3e1c65a41d2c3a35e36fffc69f,essential,0
Tell me about foxes,A fast orange fox leaps across a sleepy canine,example1.txt,animals,0.5364233255386353,1,d13c33ed4b49974e226f7c959cfd1b487592d9e2e3b8bff56bce355e8e254be6,essential,1
What's the weather like?,The weather is beautiful today,example2.txt,weather,0.4702893495559692,2,7b5e8431ade32326c76296a4ab46e27dd92fb2c60670288960fe7b5f3f1b4df4,essential,2
What's the weather like?,It's raining cats and dogs outside,example2.txt,weather,0.2783700823783874,3,f85c3fb7509b0302d0cdd1a9582c99c4fcd34c9cbbd461e47598661e3a51f780,essential,3
What programming languages are mentioned?,Python is a popular programming language,example3.txt,programming,0.5954955816268921,4,b61311334a9ffb857e6c98453a00325814ecd989326e1f5a1aaf0e1d4d27e222,essential,4
What programming languages are mentioned?,JavaScript is widely used for web development,example3.txt,programming,0.4239958524703979,5,127fcd8d2dd5229824c8382418ca92e8988ecf89d4798f0294fe7c66391a79cb,essential,5


## Cleanup

Clean up temporary files created during the demonstration.

In [8]:
%cleanup -f
%dump -f -r :-2

Name,Type,Object
output_path,PosixPath,test_video.mp4
index_dir,PosixPath,test_index.d
