AI Agent Evaluator/Labeller (Alignerr)
As a freelance evaluator at Alignerr, I assessed the quality of AI agent responses to identify failure cases and support model training. My work involved developing structured evaluation protocols and refining prompts to enhance accuracy and clarity. I labeled and analyzed model responses for leading-edge Anthropic models, achieving an 85% pass rate. • Evaluated and rated text outputs from AI agents in real-world scenarios. • Crafted and tested prompt variations to improve AI performance. • Provided feedback to data science teams on identified failure cases and edge cases. • Enhanced the clarity and robustness of evaluation frameworks.