AI Output Evaluator (Independent, Remote)
I systematically evaluated the outputs of large language models using structured rubrics and analytical frameworks. Each task required assessing the accuracy, clarity, and logical coherence of responses generated by AI systems. I identified reasoning flaws, flagged hallucinations, and documented findings to support model improvement initiatives. • Applied rubric-based criteria to evaluate LLM responses • Produced structured feedback and detailed evaluation reports • Conducted error detection and logical analysis of multi-step reasoning tasks • Utilized both manual review and research methods for quality assurance