Senior AI Operations Specialist & Evaluator
As a Senior AI Operations Specialist & Evaluator, I audited AI-generated code for logical, mathematical, and security benchmarks. I developed a scoring system for technical reasoning and collaborated to refine data via prompt engineering and stress-testing. My work included authoring technical reports focused on AI model performance. • Evaluated code outputs in Python, C++, and Go. • Built scoring systems for reasoning depth and hallucination detection. • Refined training datasets for improved model learning. • Produced over 50 technical evaluations on multi-step reasoning.