AI Image evaluator and masking
Reviewed multi-step agent workflows, including reasoning traces and tool outputs, ensuring correctness and completeness. Delivered actionable feedback to improve model performance and user experience • Applied benchmarking frameworks to assess model performance across diverse prompts and edge cases and provided structured, actionable feedback to enhance model responses and user experience.