Multi-Modal RAG Pipeline with Vector Database Retrieval
I architected and implemented a Retrieval-Augmented Generation (RAG) pipeline for semantic search across a large legal document collection. The process involved embedding, indexing, and human annotation of 200,000 documents to support high-quality retrieval and reduced hallucination rates. Benchmarking included human-annotated query evaluation for system performance measurement. • Responsible for embedding legal documents and collecting labeled data for retrieval tasks. • Managed benchmark construction with human annotations and query labeling. • Ensured labeled examples covered a variety of search and retrieval scenarios. • Documented pipeline for reproducibility and future annotation tasks.