This project implements a Retrieval-Augmented Generation (RAG) pipeline for querying lecture materials (PDF slides and .txt transcripts) using local computation only.
Users can build a searchable vector database from lecture content and ask natural-language questions to receive answers grounded in the relevant documents.
Example outputs located in: example-outputs/. They were generated with Ollama's deepseek-r1:1.5b model as is in config.py. BERT question (q_BERT) answers based on retrieved context, q_backpropagation
provided formulas that are not in the retrieved documents. q_visualization retrieves good relevant documents and the answer is based on them.
rag/
├── loader.py # Loads .pdf and .txt files from a directory
├── splitter.py # Splits long texts into overlapping chunks
├── embedder.py # Embeds text chunks and stores them in ChromaDB
├── retriever.py # Retrieves top-k similar chunks for a query
├── generator.py # Uses an LLM to generate answers from context
├── pipeline.py # High-level build/query functions
main.py # CLI entry point for building DB or asking questions
llm_interface.py # Unified interface for multiple LLM backends
config.py # Configuration for LLM provider and model
- Loader: Uses PyMuPDF for PDFs and standard file reading for text files.
- Splitter: Uses LangChain’s
RecursiveCharacterTextSplitterfor robust chunking. - Embedder: Utilizes
all-MiniLM-L6-v2for fast, high-quality text embeddings. - Retriever: Queries ChromaDB for top-k similar chunks using cosine similarity.
- Generator: Leverages an LLM (as defined in
config.py) to answer questions from retrieved context. - Pipeline: Orchestrates all steps: load → split → embed → retrieve → generate.
Requires Python 3.12.3 or higher (older versions cause issues with ChromaDB).
# Create virtual environment (one-time)
python3 -m venv ~/virtualenvs/rag_env
# Activate environment (every time)
source ~/virtualenvs/rag_env/bin/activate
# Install dependencies (one time)
pip install -r requirements.txtRecommended: Ollama
-
Install Ollama
curl -fsSL https://ollama.com/install.sh | sh -
Download a model
ollama pull deepseek-r1:1.5b # or try: ollama pull llama2, llama2-7b, etc. -
Run the model server
ollama serve
Want to add another LLM backend?
Just create a new class inllm_interface.pyand updateconfig.pyaccordingly.
. ~/.virtualenvs/rag_new/bin/activateFor Ollama:
ollama serveThis loads and processes files from the data/ directory run this in code. Be aware you have to build and ask questions with the same LLM (different tokenizer used for each LLM build with 1 LLM running with other might not work:)
python3 main.py --buildpython3 main.py --query "What is the chain rule in backpropagation?"I generated transcripts from lecture videos:
- Download
.mp4videos usingdownload_lectures.sh. - Run Whisper on MetaCentrum via
run_whisper.pbs.
I typically run 3–4 videos at a time, each taking ~20 minutes to transcribe.
- Works great on CPU-only machines, but LLM generation is slow (~1 minute per answer).
- On machines with a GPU, generation would drop to just a few seconds, but the aim was here that anyone can run this locally.
- Retrieval is fast (1–2 seconds) even without a GPU — showcasing the efficiency of the vector search.
For real-time performance, I'd need better hardware or a cloud solution — but this system is intentionally designed for fully local use. ✅
- add evaluation of LLM answer
- add test cases (testing in general)
- check if LLM really only nicely puts together what is in the searched data and does not go on its own
- Extract slide change timestamps from the PDF (e.g. slides templated look at bottom-left page numbers change in video) to sync with transcripts.
- Add metadata filtering (e.g., by lecture title, source).
- Build a GUI like chatpdf.com.
- Apply RAG to other domains: legal, medical, etc.
- test if it would pass Milan Starka's publicly available questions for subjects:)
- in future compare this RAG system on TREC dataset eval
- geenration in future
- let LLM judge the quality of the answer
- query rewriting: generate synonyms, subquestions to obtan better result
- cos retrieval works badly on retrieval on specific things (fe all mentions about google pixel 6a) bm25 hybrid search is an industry standard.