Senior AI Evaluator
Evaluated and rated model-generated programming outputs using RLHF methodologies and custom rubrics. Developed style guides for synthetic data generation to improve factuality and reduce model hallucinations. Implemented and managed real-time evaluation dashboards to monitor model drift and logic errors. • Led thorough assessments of multi-turn reasoning and code logic • Authored guidance to enhance dataset consistency and reliability • Directly contributed to the fine-tuning of a GPT-4 competitor • Reduced hallucination rates in model outputs by 40%