Voice Activity Detection
The Voice Activity Detection (VAD) project involves labeling speech and non-speech segments in audio with precise token annotations, focusing on user and assistant interactions. It is a moderate-sized project requiring about 2 hours per data row, with strict quality controls to ensure accurate timestamps, token types, and continuous speech labeling for training speech recognition models.