Multimodal Data Labeling (video)
This involves creating high-quality question-and-answer pairs from short video and audio segments. The task requires closely analyzing both visual and audio information to design clear, natural questions that can only be answered by combining what is seen and heard. Responsibilities include ensuring questions are unambiguous, non-trivial, and professionally written, with concise, factual answers in complete sentences. This demands strong attention to detail, good judgment, and the ability to identify meaningful interactions between audio and video to produce reliable, multimodal training data for AI systems.