Generalist IA Reviewer
This role involved evaluating over 50 AI systems using structured grading rubrics across multiple domains, including programming and mathematics. I assessed the output quality and performance of AI models in at least three distinct fields. Written, detailed evaluations were produced for each session to support iterative improvement of the models. • Systematically reviewed AI behavior and performance in code and mathematical problem-solving tasks • Used defined grading criteria and provided comprehensive written feedback • Addressed various domains such as general use, programming, and math • Contributed to robust written documentation for each model evaluation