Rubric-Based LLM Response Evaluation (Scoring & Pairwise Ranking)
Evaluated and compared LLM responses using multi-dimension rubrics, including instruction following, truthfulness, reasoning quality, clarity, and formatting compliance. Performed pairwise ranking and produced concise, guideline-aligned justifications. Focused on detecting logical gaps, unsupported claims, constraint violations, and inconsistencies, while maintaining scoring consistency across tasks.