AI Realtime Evaluator
As an AI Realtime Evaluator, I conducted real-time assessments of large language model outputs during live conversational interactions. My work involved evaluating accuracy, reasoning, factual correctness, clarity, tone, and safety alignment of AI-generated responses. I provided structured feedback and detailed annotations to improve model performance and contributed to the creation of high-quality training data. • Evaluated over 400 responses per week with inter-rater agreement • Identified hallucinations, logical flaws, incomplete reasoning, and improvement areas • Performed fact-checking using public sources and external tools • Supported AI research teams through rubric-based, asynchronous collaboration