Audio Evals
consistently select and edit Key Automatic Speech Recognition (ASR) segments of a video that effectively summarize the video’s most critical and representative spoken information. This helps the model get better at speech recognition. The selected content must meet the following criteria: ● Representativeness: The selected ASR segments must accurately reflect the video's primary theme, narrative, or objective. ● Comprehensiveness: Annotators are strongly encouraged to provide thorough descriptions and err on the side of comprehensiveness in selecting segments. ● Conciseness: Avoid including extraneous setup, filler language, repetitive transcription, or transitional footage in the segment. ● Non-Redundancy: Do not select multiple segments that convey the exact same core message.