AI Chatbot Quality Analyst (Contract)
Evaluated large language model (LLM) chatbots for context retention, coherence, and tone consistency over multi-turn conversations. Designed adversarial and stress-test dialogue scenarios to reveal model failure modes and documented failure patterns using structured rubrics. Rated chatbot responses using RLHF-aligned human preference frameworks and contributed to developing a reusable QA template library for AI evaluations. • Conducted structured evaluations across 30+ topic domains • Designed and executed 150+ dialogue test scenarios • Documented failure patterns for feedback to product teams • Rated chatbot outputs using RLHF frameworks