Data Labeling Specialist
The project operates in the AI model evaluation and quality assurance domain, specifically focused on human-in-the-loop review of AI-generated conversations. It is part of a structured labeling workflow used to evaluate, diagnose, and improve large language model behavior across defined failure categories (testing axes) such as memory, instruction retention, coherence, and version editing