AI Model Evaluator (Video Model Evaluation)
I spearheaded A/B testing and quality evaluation for multiple video models, including background replacement and gesture recognition systems. I systematically assessed configuration variations to accelerate model selection and implemented centralized prompt repositories for multi-agent workflows. I architected evaluation pipelines, integrating LLM-as-a-Judge workflows and CI/CD QA gates for continual assessment. • A/B testing and multi-model evaluation • Optimization of prompt iteration and deployment workflows • Automated and human-in-the-loop (HITL) assessment strategies • End-to-end video model quality control and validation