Outlier AI
I worked on a large-scale AI training data project involving the evaluation and refinement of LLM-generated outputs for code generation, multi-turn prompts, and reasoning tasks. The scope included creating high-quality training data through rigorous rubric-based review, validation of test logs, and refinement of prompts for intent clarity and instructional alignment. Tasks included reviewing code diffs, validating unit test coverage based on execution logs, rewriting problem statements and requirements to match original inputs, and scoring outputs using structured evaluation criteria. The project operated at scale, with hundreds of daily annotations and peer-reviewed submissions. Quality standards were strict, including log-based test validation, rubric adherence, and multi-stage review workflows to ensure consistency, fairness, and reproducibility.