AI Output Evaluator
As an AI Output Evaluator, I assessed the accuracy, safety, and coherence of AI-generated responses across diverse text tasks. I consistently followed structured rubrics to evaluate outputs and provided written feedback to support model improvement. My work focused on rating reasoning, summarization, mathematics, and code-related responses using multiple platforms. • Used platform-provided guidelines to score outputs for correctness, factuality, and adherence to instructions. • Flagged hallucinated facts, logical inconsistencies, and unsafe content consistently. • Ranked model outputs by preference and delivered justifications grounded in evidence. • Maintained high inter-rater agreement and completed calibration for quality control.