Contributor
Reviewed and compared AI-generated responses to the same prompt, focusing on general knowledge tasks. Evaluated answers based on accuracy, clarity, relevance, logical reasoning, and alignment with prompt intent. Applied standardized evaluation guidelines to identify strengths, weaknesses, and preference between responses, and provided structured feedback to support model improvement.