EgoHowTo, MMLLM, AI Agents Performance Evaluation, Caption Writing.
1. Ego How-ToScope: Video interpretation focusing on instructional methodology. Project Size:8 months of raw video data. Labeling Tasks: Temporal triggering (start/end frame identification), atomic action tagging, and technical "how-to" narrations (e.g., “Grasps tool with 45° wrist rotation”). Quality Standard: Temporal Precision & Granularity. Measured by frame-level accuracy and adherence to strict action-verb taxonomies. 2. MMLLM Performance Evaluation Scope: RLHF-based preference testing for text and image-based Multimodal Large Language Models. Project Size: 5 months of text-image prompt response sets. Labeling Tasks: Side-by-Side (SbS) preference ranking, hallucination identification, and validation of multimodal reasoning chains. Quality Standard: Truthfulness & Grounding. Measured by the elimination of factual errors and ensuring the model "sees" and describes image details without fabrication.