AI Model Evaluator – Multimodal (image, audio, video, text) (Contract)
Compared and ranked outputs from multiple AI models across image, audio, video, and text modalities using detailed rubrics. Evaluated image generation quality, performed object removal and inpainting tasks, and assessed audio responses for clarity and correctness. Provided structured, written justifications for model ratings and identified nuanced failure modes such as hallucinations and visual artifacts. • Multimodal evaluation included image, video, audio, and text outputs. • Prompts written to guide image editing and inpainting. • Visual grounding and instruction-following were key assessment areas. • Failure modes like hallucinations and instruction drift were documented.