Senior AI Code Evaluator & LLM Trainer
As a Senior AI Code Evaluator & LLM Trainer at Outlier (Scale AI), I evaluate AI-generated code outputs for quality and alignment. I provide structured RLHF preference feedback to enhance LLM reasoning and code generation, designing adversarial prompts and analyzing model outputs for failure modes. My work directly feeds into fine-tuning datasets for major AI labs, focusing on code correctness, security, and instruction adherence. • Evaluated 250+ code outputs monthly across Python, JavaScript, Java, and C++ • Delivered RLHF feedback and comparative output rankings • Designed red-teaming and edge-case test scenarios • Maintained a 98% quality score as a top 5% evaluator