Search Quality & Query Evaluation
I have performed AI model evaluations using HELM (Holistic Evaluation of Language Models), and GEE (Generalized Evaluation for AI Systems) to assess response truthfulness, instruction following, coherence, specificity, fluency, and reasoning. Conducted structured pairwise ranking and fine-grained scoring to refine AI-generated outputs. Provided expert linguistic assessments and model improvement suggestions to enhance conversational AI performance and user satisfaction.