Handshake AI: Project Ohm
Systematically evaluate model responses across multiple quality dimensions. Review model-generated answers and compare them using structured criteria. Based on these evaluations, assign preference rankings that reflect which responses best satisfy the prompt's intent and demonstrate strong reasoning and communication.