AI Scenario Design & Evaluation Practice (Academic Projects)
This role involved designing structured scenarios to evaluate AI agent outputs and establishing clear evaluation criteria. Responsibilities included defining expected AI model behaviors and thoroughly documenting system errors or failures. The work focused on the consistency and reusability of evaluation data using format standards such as JSON. • Designed structured evaluation scenarios for AI agents using text-based data. • Defined and scored AI outputs for correctness, completeness, and reasoning using formal metrics. • Documented testing edge cases and performed detailed error analysis. • Utilized JSON to organize labeling frameworks and results for scalability and analysis.