# Video Moment Retrieval Annotation Guide - FOOD Domain ## Goal Identify and mark temporal segments in videos that match natural language queries. Provide accurate boundaries and domain-specific metadata for training. ## Instructions 1. **Read the Query**: Understand what action/event you need to find 2. **Watch the Video**: Identify all segments that match the query 3. **Mark Boundaries**: Use the timeline to mark precise start/end times 4. **Classify**: Select the action type and objects present 5. **Describe**: Write a visual proxy description for CLIP training ## Domain-Specific Actions - chopping ingredients - mise en place - mixing ingredients - kneading dough - sautéing - stirring the pot - deglazing - tasting food - adding seasoning - boiling - grilling - baking - plating the dish - garnishing - sauce drizzle - recipe introduction - finished dish reveal - eating reaction ## Quality Guidelines - **Boundary Accuracy**: Mark within 0.5 seconds of actual moment - **Query Specificity**: Segment should clearly match the query - **Visual Description**: Write what you SEE, not what you know - **Complete Coverage**: Mark ALL segments that match, not just the first one ## Visual Proxy Guidelines Write descriptions that are: - **Visually grounded**: "person in white chef coat slicing red tomatoes" - **Specific**: Not "cooking" but "stirring wooden spoon in stainless steel pot" - **Action-focused**: Include the action and key objects - **CLIP-friendly**: Use clear, descriptive language ## Common Mistakes to Avoid - Marking too short (missing context) - Marking too long (including unrelated content) - Vague queries ("something interesting") - Abstract descriptions ("delicious food" vs "golden brown crust on pie") ## Confidence Levels - **High**: Certain about boundaries (within 0.5 seconds) - **Medium**: Boundaries approximate (within 1-2 seconds) - **Low**: Action is ambiguous or boundaries are unclear
Total Budget
$125
Pay per Label
$0.0575
Time Requirement
Flexible
Duration
1 month
35 hours of food videos
Software
Hiring Type
Required Location
Workload / Schedule
Finish the dataset in 1 week
Software
Data Type
Label Types
Subject Matter / Industry
Language
Job Type
Share link