AI Technical Evaluator & Dialogue Specialist
As an AI Technical Evaluator & Dialogue Specialist, I conducted rigorous evaluations of large language models (LLMs) focusing on their adherence to system constraints and quality standards. My work involved identifying hallucinations, assessing logic, and ensuring safety through adversarial persona simulations. I developed high-quality reports and rubrics that contributed directly to model improvement and alignment efforts. • Designed and executed over 500+ complex test cases for LLM evaluation. • Simulated adversarial multi-turn dialogues to uncover edge-case and failure modes. • Produced detailed technical quality reports for AI engineering teams. • Standardized evaluation rubrics for AI model instruction-following and emotional intelligence (EQ).