Freelance ML Task Creator & AI Agent Evaluator
Designed and evaluated machine learning tasks for AI coding agents, focusing on assessment of agent outputs and model behavior. Developed detailed technical documentation outlining task structure, evaluation criteria, and findings on AI agent performance. Iteratively refined tasks based on observed agent success and failure to enhance training signals. • Created Python-based ML tasks covering PyTorch, TensorFlow, and JAX workflows • Evaluated AI model outputs to identify edge cases and performance degradation • Benchmarked model responses following best practices in ML • Documented methodology to improve agent learning and dataset quality