AI Evaluation & Data Engineer
Led the curation and annotation of high-quality prompt-response pairs for LLM fine-tuning and agentic workflows. Evaluated RAG workflows through benchmark dataset creation and relevance annotation to improve system accuracy. Automated data validation and labeling pipelines, ensuring rigorous QA/QC for LLM fine-tuning datasets. • Focused on factual consistency, tone alignment, and hallucination reduction. • Mentored interns on labeling standards and prompt optimization. • Used Label Studio and internal/proprietary tooling for annotations. • Achieved 25% increase in contextual relevance and 40% reduction in deployment time.