AI Engineer Intern - RAG Pipeline and QA Labeling
Built an end-to-end Retrieval-Augmented Generation (RAG) pipeline using embedding models and LLMs to deliver context-aware answers based on large-scale datasets. Focused on preprocessing, labeling, and curating mental health data to ensure high-quality model training and retrieval accuracy. Designed question-answering tasks requiring annotation, dataset curation, and embedding alignment for AI pipeline optimization. • Constructed embeddings for dataset chunks and aligned them with LLM outputs for QA tasks. • Performed QA annotation to improve data relevance in semantic search workflows. • Utilized Python and embedding models to generate labeled training/test sets for the RAG system. • Ensured dataset diversity and representativeness for accurate question answering.