Lead Dialogue Writer and Evaluator (Multi-Persona AI Dialogue Simulation Project)
Led multi-persona LLM dialogue simulation to stress-test AI conversational performance. Designed structured datasets labeling hallucination, inconsistency, emotional misreading, and contradiction in model responses. Developed evolving test prompts to probe performance under constraint escalation. • Stress-tested LLMs against multi-turn, persona-rich conversational scripts • Created structured labels for hallucination and reasoning error analysis • Analyzed dialogue logs and annotated model response quality • Iteratively improved annotation and evaluation rubrics