Data Labeler & Annotator (Preference Ranking) – OuterLAI (Omni ELO Evaluation)
Served as a Data Labeler & Annotator for the Omni ELO Evaluation project, specializing in preference ranking for large language model (LLM) outputs. Processed 3,500+ pairwise comparisons with high accuracy and flagged harmful or contradictory responses. Achieved 94% inter-annotator agreement through calibration and continuous rubric refinement. • Labeled over 3,500 comparison pairs with 95% consistency against gold standards. • Processed 70+ pairs per 6-hour shift, maintaining 98% daily throughput. • Proposed over 12 rubric clarifications, reducing disputes by 25%. • Flagged 150+ harmful/contradictory model responses, supporting safety evaluations.