Rubric Development & Quality Review
I worked on developing and refining rubrics specifically for evaluating chatbot and LLM behavior. This included designing clear criteria for judging response quality, reasoning accuracy, safety compliance, tone, and instruction-following. I also helped improve annotation guidelines by identifying ambiguous cases, clarifying edge conditions, and reviewing example responses. My work ensured consistent and reliable evaluation of chatbot outputs across large datasets, maintaining high-quality standards for conversational AI training.