AI Code Evaluator at Outlier AI
As an AI Code Evaluator at Outlier AI, I appraised machine-generated code samples for correctness and quality. I developed scoring rubrics and prompt specifications contributing to enhanced LLM benchmark performance. My efforts supported both model improvement and standardization of evaluation methods. • Evaluated 500+ HTML, CSS, and JavaScript code samples for accuracy and compliance with best practices. • Authored detailed rubrics for binary and multi-criteria rating by distributed teams. • Designed prompts to uncover edge cases and reduce recurrent code generation errors. • Collaborated asynchronously in a quality-focused, international review environment.