LLM Training Data Quality Calibration, RLHF & SFT Evaluation
Worked on large-scale training and evaluation of Large Language Models (LLMs) using Reinforcement Learning from Human Feedback (RLHF) and Supervised Fine-Tuning (SFT). Evaluated and labeled AI-generated responses for correctness, reasoning quality, instruction adherence, factual accuracy, and safety compliance. Performed prompt-response analysis, quality rating, and response correction to improve overall model performance and reliability. Led calibration and quality assurance efforts for a distributed team of 100+ trainers, ensuring consistency and adherence to strict client guidelines. Reviewed annotated datasets, identified quality gaps, and provided actionable feedback to improve annotation accuracy. Contributed to workflow optimization, guideline development, and performance tracking using dashboards and operational metrics. This work supported production-level GenAI systems used in real-world enterprise applications.