LLM evaluation and response
In the Cookie Rubric project, I tested AI models by engaging with them until failure, then evaluated and ranked their responses. I created rubrics for ideal outputs, re-rated responses against these standards, and provided written justifications. This work contributed to improving the accuracy, reliability, and sophistication of LLM-generated answers.