Outlier AI - AI Training tasks
Engaged in an ongoing AI model and agent evaluation project focused on computer code and programming tasks, model stump. Responsibilities include prompt engineering, reviewing, annotating, and rating AI-generated code outputs across multiple programming languages, assessing correctness, efficiency, and adherence to best practices. Evaluated agentic workflows and multi-step reasoning tasks to identify model strengths and failure modes. Maintained high-quality annotation standards by following detailed rubrics and guidelines, contributing to the improvement of LLM coding capabilities