AI Training & Evaluation Contributor
I designed and evaluated multi-turn conversational scenarios to test LLM performance using structured guidelines and rubrics. Responsibilities included simulating diverse user personas, crafting adversarial dialogues, and stress-testing language models for weaknesses in reasoning and emotional intelligence. I performed structured annotation, failure mode detection, and documented detailed evaluation results for team review. • Simulated emotional escalation, contradiction, and goal-driven behavior across rich persona types • Applied prompt engineering techniques to refine model outputs and robustness • Evaluated model responses for instruction adherence, context retention, and logical consistency • Conducted structured QA and annotation following set rubrics and evolving requirements.