AI Content Evaluator (Scale AI / Remotasks)
Evaluated and rated outputs from generative AI models such as chatbots and text summarizers for quality and alignment with human behavior. Annotated large-scale textual datasets to facilitate the training and fine-tuning of NLP models, providing Human-in-the-Loop (HITL) feedback to enhance model performance. Identified systematic error patterns, compiling actionable feedback for engineering teams to improve model outputs. • Conducted ongoing annotation and evaluation using proprietary platforms. • Focused on Natural Language Processing and conversational AI subject matter. • Delivered detailed rating reports and flagged edge cases. • Contributed directly to model accuracy, safety, and response quality improvements.