AI Prompt Evaluator & Response Rater
As an AI Prompt Evaluator & Response Rater, I systematically evaluated LLM prompt outputs for quality and instruction-following. I developed detailed annotation guides for measuring accuracy, coherence, helpfulness, and safety on a 1–5 scale. This work involved tracking and analyzing prompt-response pairs to identify recurring model issues. • Performed comprehensive rubrics-based evaluation of AI outputs • Identified hallucinations, instruction drift, and sycophantic model behaviors • Developed and refined annotation guidelines for prompt-rating tasks • Maintained detailed logs of evaluation data for RLHF purposes