Voice Activity Detection Annotator
This project involved annotating audio data to train and evaluate AI speech recognition models. The core task was to meticulously transcribe and timestamp all speech and significant non-speech events in human-AI dialogues. Key Responsibilities: Precisely labeled speaker turns (User and AI Assistant) with millisecond accuracy. Classified user tokens into Standard Speech, Interruptions (barge-ins), and Acknowledgements (backchanneling like "mm-hmm"). Annotated non-speech events such as <pause>, <stop>, and <noise> (e.g., coughs, laughter) according to strict linguistic guidelines. Ensured continuous, gap-free annotation across the audio timeline to create seamless data for model training. Skipped tasks where the AI responded in a non-English language to maintain data integrity for the target models. The resulting annotated data was critical for improving how AI assistants handle natural conversational nuances like filled pauses, overlaps, and turn-taking.