AI Response Evaluation
Worked on large-scale AI training and evaluation projects focused on improving language model performance through structured data labeling and quality assessment. Reviewed and annotated AI-generated responses for accuracy, relevance, clarity, safety, instruction-following, and overall usefulness across a wide range of prompts. Performed pairwise ranking tasks to compare model outputs, identified errors and hallucinations, and provided detailed justifications to support training improvements. Contributed to multiple projects involving speech, reasoning, and general language understanding tasks. Maintained high quality standards by following strict annotation guidelines, meeting productivity targets, and ensuring consistency across thousands of evaluated samples.