Lead AI Content Evaluator
As Lead AI Content Evaluator at DataAnnotation.tech, I evaluated Large Language Model (LLM) outputs for logical coherence, safety, and helpfulness. I specialized in auditing chain-of-thought responses and identifying hallucinations, as well as conducting adversarial red teaming to ensure robust model behavior. I collaborated with stakeholders to refine annotation guidelines for new model behaviors. • Evaluated LLM outputs for consistency, factuality, and safety • Applied adversarial red teaming to test content security boundaries • Improved annotation guidelines via asynchronous collaboration • Identified and reported subtle model hallucinations for further mitigation