LLM Output Evaluation & AI Safety Annotation
Contributed to large-scale annotation projects evaluating the safety of AI-generated text outputs. Tasks included classifying and rating responses against safety guidelines, identifying harmful or biased content, testing prompt–response interactions, and providing structured reasoning for ambiguous cases. Applied harm taxonomies consistently across thousands of samples to ensure data quality. Feedback from evaluations was used to refine safety guidelines and improve model alignment.