Senior AI Output Evaluator & Content Rater
In this role, I reviewed and evaluated high volumes of AI-generated text outputs across multiple domains. My work involved annotating, rating, and flagging text data for model improvement using multi-step rubrics. I consistently maintained superior adherence scores and provided structured feedback to enhance RLHF training processes. • Maintained a 99.1% guideline adherence rate across 24+ months and processed over 12,000–15,000 outputs monthly. • Annotated 80,000+ data points from legal, medical, technical, and creative datasets while identifying 3,200+ edge cases for rubric improvement. • Delivered detailed feedback on low-quality outputs and escalated ambiguous guideline instances to reduce team calibration errors. • Upheld throughput of 500+ evaluated tasks per day during sustained 6–8 hour independent work sessions.