AI Prompt Evaluator
I worked on large scale human-in-the-loop data labeling and evaluation projects for large language models via the Outlier platform. The scope included reviewing, rating and improving AI generated text using structured rubrics focused on accuracy, reasoning quality, relevance, tone and safety. Tasks included identifying factual errors or hallucinations, comparing multiple model outputs, and providing justified selections to meet defined quality standards. The work has been strict adherence to task guidelines, consistency requirements and internal quality checks.