Analyst – AI Task Environment Evaluation
• Contributed to a firm backed by researchers at OpenAI and Anthropic • Designed and stress-tested task environments for AI agents, applying error analysis and performance evaluation • Evaluated 1,000+ agent outputs to identify errors, inconsistencies, and failures, through repeated testing cycles across ~76 task environments • Refined task environments, ensuring <0.01% input/output deviation and targeting 10%-30% AI agent success rates