Text Data Evaluation Specialist – Outlier.ai LLM Audit Project
Worked as a data evaluator and annotation specialist for a project focused on auditing and refining training data for large language models using Outlier.ai. Responsibilities included evaluating model-generated responses, rating prompt-output quality, and tagging issues related to factuality, helpfulness, tone, and safety. Contributed to NER labeling, text summarization accuracy reviews, and classification of intent behind user prompts to improve training feedback loops. The project required careful attention to instruction adherence and linguistic nuance, covering over 40,000 annotated text pairs across multiple domains. I participated in guideline calibration sessions, performed inter-rater reliability assessments, and used Jira to log anomalies and QA observations.