Lead AI Trainer | Labelbox / Outlier / Appen
As Lead AI Trainer, I conducted technical evaluations of model-generated code using Labelbox and other platforms. I ensured correctness, security, and accuracy by reviewing more than 500 code samples, especially in Python and data science libraries. I provided detailed rationales and flagged hallucinations or technical errors to support LLM development. • Evaluated Agentic Coding tasks primarily for Python and data science domains. • Provided structured, evidence-based rationales exceeding 200 characters per case. • Identified security issues and subtle hallucinations in code and model responses. • Ensured compliance with strict SOPs and high-quality auditing standards.