Cypher
Contributed to Cypher_Evals model evaluations by comparing pairs of LLM-generated responses. Each pair was assessed based on detailed criteria, including instruction following, truthfulness, localization, and verbosity. In addition to rating individual aspects, I provided an overall judgement on which response was better and explained the reasoning behind my choice. These tasks were completed under strict time and quality constraints, requiring both precision and efficacy.