Generalist Evaluator Expert (LLM & Multimodal Models)
As a Generalist Evaluator Expert, I evaluated LLM and multimodal model responses for accuracy, safety, and alignment. I designed judgment tests and provided rubric-based scoring to support model selection and tuning. My work included hallucination detection and detailed qualitative feedback to refine model behavior. • Conducted structured evaluation of model outputs • Ran side-by-side model comparisons and preference ranking • Detected hallucinations, factuality errors, and risk factors • Created custom prompt instructions and scenario tests