AI Model Evaluator | Generative AI & Agent Systems
Evaluated outputs from large language models for reasoning, summarization, coding, and instruction following. Applied standardized rubrics and qualitative scoring frameworks to assess accuracy, coherence, safety, and instruction adherence. Identified and documented model weaknesses including hallucinations and inconsistencies to support model refinement. • Conducted structured feedback documentation and calibration with other evaluators. • Delivered actionable insights that reduced recurring model errors. • Maintained high inter-rater reliability during peer audits. • Participated in ambiguity reviews and edge-case identification for guideline improvement.