Lead AI Evaluator
As Lead AI Evaluator at Neural-Logic Systems, I was responsible for evaluating and ranking model outputs through RLHF methodologies. I assessed thousands of AI-generated completions for truthfulness, helpfulness, and harmlessness, supporting reward model improvement. Additionally, I led efforts in red-teaming, quality benchmarking, and vulnerability identification. • Ranked over 15,000 model completions to improve RLHF reward models. • Led red-teaming exercises focused on discovering edge-case vulnerabilities in medical and legal queries. • Authored internal quality rubrics for annotator teams. • Improved model accuracy and safety through systematic data evaluation.