Large Language Model Response Evaluation & Annotation
Contributed to training and improving large language models by evaluating AI-generated responses for accuracy, coherence, tone, safety, and policy compliance. Applied structured rubrics to rank outputs, identify hallucinations, flag policy violations, and provide detailed written justifications to support model refinement. Completed high-volume annotation tasks while maintaining strict quality benchmarks and turnaround times.