AI Evaluation Consultant & Scenario Designer
Designed and implemented AI evaluation and human-in-the-loop testing frameworks to improve agent reliability. Conducted large-scale review and structured testing of language model agents using scenario-based approaches. Evaluated prompt outputs, validated model responses, and performed edge case analysis for AI systems. • Developed rubrics for LLM output validation • Analyzed agent behavior and failure modes • Collaborated on scalable agent reliability testing • Applied doctoral research methods to real-world AI validation tasks