Chatbot Response Comparison & Evaluation
Compared outputs from two AI chatbots for the same user queries, evaluating responses for accuracy, relevance, safety, coherence, and guideline compliance. Annotated and rated responses to identify inconsistencies, strengths, and areas for improvement, supporting fine-tuning and quality enhancement of the models. Ensured consistent and reliable evaluation across multiple rounds of assessment.