AI Model Evaluator – Outlier AI
As an AI Model Evaluator at Outlier AI, I supported the evaluation and performance analysis of large language models (LLMs) and multimodal AI systems. My work focused on assessing AI outputs, identifying errors, and validating output consistency across custom image and video datasets. This included applying structured rubrics, testing transformation functions, and validating mathematical and visual reasoning. • Analysed AI-generated outputs for accuracy, consistency, and error detection. • Performed structured testing with image and video data using Python-based analysis tools. • Designed and applied evaluation rubrics to classify responses and inconsistencies. • Identified edge cases in multimodal model performance and documented findings.