Fine-Grained Criteria
In this project, I was presented with a system prompt for a specific enterprise. The user then gave a single or multi-step prompt for which there were multiple model responses. My task was to create fine-grained criteria with which to grade these models, and decide, based on the criteria, which of the models answered the prompts better. If none of the models gave an acceptable answer, I would have to edit the better response to produce an ideal response.