Prompt and Criteria-based model evaluation
Designed high and low level evaluation criteria to review Ai model responses. Created complex prompts to stress test models. Measured and evaluated model performance in different scenarios, documenting results to highlight areas in need of refinement. Contributed to building feedback pipelines to improve model reasoning and quality of output.