AI Response Evaluation & Comparative Ranking
Contributed to AI model improvement by systematically evaluating and ranking AI-generated responses across a wide range of general knowledge, reasoning, and conversational tasks. This involved comparing multiple model outputs side by side and selecting the most accurate, helpful, and well-structured response based on defined quality rubrics. Tasks included identifying hallucinations, factual errors, logical inconsistencies, and tone issues in AI outputs, providing detailed written justifications for every rating decision to support model fine-tuning.