AI Model Evaluator (Video Model Assessment)
I led the evaluation and quality rating of video-based AI models, concentrating on multimodal A/B testing and model selection for tasks like background replacement and gesture recognition. My efforts focused on scrutinizing model outputs for reliability across over 50 configuration variations, utilizing human-in-the-loop processes to ensure optimal selection and robust validation. Automated assessment pipelines and quality gates played a central role in reducing manual QA workload and increasing the consistency of video labeling outcomes. • Executed systematic multimodal video evaluation protocols, including in-depth configuration testing. • Applied HITL reviews to detect failure cases and improve output consistency. • Deployed and refined automated LLM-as-a-Judge workflows for video model assessment in CI/CD environments. • Accelerated model selection by 25% and cut QA effort by over 40% through integrated AI quality controls.