Remote Customer Support & Chat Specialist / LLM Evaluator
I simulated complex multi-turn conversational scenarios to test and evaluate automated AI language model response systems. I assessed dialogue flow, user experience, and emotional engagement, carefully documenting observations and identifying model weaknesses. Actionable feedback was provided based on structured guideline adherence to improve instruction following and dialogue coherence. • Tested LLM outputs for realism, consistency, and adherence to designed conversational intent. • Evaluated and documented LLM failure modes, performance issues, and behavioral inconsistencies. • Created structured evaluation reports with detailed metrics and improvement recommendations. • Adapted conversational personas to thoroughly assess LLM robustness in diverse contexts.