Chatbot Evaluation and Conversation Quality Rater
Worked on chatbot evaluation and conversational data annotation across multiple AI training platforms. Responsibilities included rating chatbot responses for clarity, correctness, safety, tone, and helpfulness; identifying hallucinations or policy violations; classifying user intent; and evaluating multi-turn dialogue coherence. Performed side-by-side model comparisons, tested edge cases in customer service scenarios, and created high-quality human responses to improve chatbot training datasets. Ensured strict adherence to project guidelines, maintained consistent quality scores, and contributed feedback for improving conversational behavior and natural language understanding in AI systems. Handled large volumes of dialogue samples in ongoing QA cycles, supporting the fine-tuning and safety alignment of conversational LLMs.