AI Model Evaluator & LLM Output Reviewer
In this role, I evaluated AI-generated responses for accuracy, hallucination, bias, and compliance with annotation guidelines. My tasks included detailed review of large language model outputs and systematic identification of areas for improvement. I played a key part in refining data quality for machine learning and generative AI systems. • Evaluated text outputs from LLMs such as ChatGPT, Claude, and Gemini. • Identified model hallucination and bias in AI responses. • Applied complex annotation guidelines while reviewing LLMs. • Contributed to feedback loops for improving generative AI outputs.