AI Response Annotator
As an AI Response Annotator at OneForma, I evaluated and scored large volumes of AI-generated text, focusing on the assessment of Large Language Model (LLM) outputs. My responsibilities included systematic rubric-based evaluation of prompt-response pairs to measure accuracy, coherence, and helpfulness. I offered feedback and flagged edge cases to enhance model reliability and participated in calibration sessions for inter-annotator consistency. • Evaluated AI outputs with rubrics focusing on multi-turn dialogues and hallucination detection. • Contributed to model benchmarking and refinement by flagging recurring failure patterns. • Participated in calibration processes to ensure scoring accuracy across annotators. • Provided actionable insights for model and guideline improvements.