AI Code Evaluator & Prompt Engineer
As an AI Code Evaluator & Prompt Engineer at Outlier.ai (Scale AI Partner), I performed large-scale evaluation of AI-generated code outputs and provided structured feedback for LLM fine-tuning. I designed and refined prompts, red-teamed model outputs for flaws, and supplied RLHF preference data across technical software domains. This role required deep knowledge of code quality, AI prompt engineering, and iterative model improvement workflows. • Evaluated thousands of code tasks in Java, Python, JavaScript, and system design. • Provided RLHF preference ranking, rankings, and detailed justifications for model outputs. • Red-teamed LLM outputs to identify logical errors, hallucinations, and security issues. • Used Outlier.ai and proprietary Scale AI platforms for evaluation and RLHF annotation.