Chatbot Response Evaluation
Evaluated AI-generated chatbot responses based on relevance, factual accuracy, tone, and safety. Annotated whether replies followed user instructions and aligned with task goals. Worked on question answering, instruction following, and dialogue completion tasks. Provided quality ratings and detailed reviewer comments to help fine-tune large language models. Also contributed to safety testing and red teaming by identifying harmful, biased, or nonsensical outputs.