Handshake AI fellow
I evaluated the correctness of responses from two different LLM models to technical prompts covering Computer and Electrical Engineering topics. My role involved determining whether each answer was correct, analyzing each response across six evaluation subcategories, and ranking the models based on accuracy, quality, and overall performance.