Evaluation & Quality Auditing (Multilingual, Multimodal)
Served as an auditor and evaluation reviewer on large-scale AI training and RLHF projects delivered through Scale AI. The role focused on auditing contributor-produced evaluation outputs rather than generating raw annotations. Responsibilities included reviewing rubric-based ratings, validating pass/fail decisions, and assessing consistency across subjective evaluation frameworks such as Likert scales and overall satisfaction metrics. The work spanned multilingual and multimodal tasks, including text, audio, vision, localization, and safety evaluation. I regularly reviewed rubric adherence under evolving specifications, identified misapplications of evaluation criteria, categorized errors (major/minor/no issue), and flagged quality risks such as inconsistent judgments or metric distortion. The role required making defensible audit decisions at scale while maintaining alignment with updated guidelines and quality standards.