AI Response/Content Evaluator
Evaluated AI-generated audio outputs across multiple projects for platforms including Turing, Mindrift, and Crowdgen, covering scope areas such as conversational AI, LLM responses, and multimodal content. Tasks included pairwise (side-by-side) comparisons, quality audits, consistency checks, and structured written justifications for evaluation decisions. Contributed to large-scale datasets involving hundreds of annotated samples per cycle. Maintained high accuracy standards by adhering strictly to project-specific guidelines, flagging instruction violations, and delivering outputs within tight deadlines as part of distributed global annotation teams.