Hazel v2 – Voice Assistant Data Evaluations
Evaluating human–voice assistant interactions as part of Appen's Hazel v2 Data Evaluations project. Working with multiple data types including audio, image, and text. Tasks involve assessing spoken factuality, response completeness, transcription accuracy (ASR hypothesis verification), and image-based query evaluation. Focusing on AI response accuracy, relevance, and naturalness across multiple criteria to contribute to conversational AI training and performance improvement.