Multimodal Classification & Annotation (Audio, Text, Image, Video)
I performed end-to-end labeling and classification across multimodal datasets, including audio transcription and tagging, text categorization, image annotation, and video event recognition. Tasks included entity extraction, frame-level tagging, segmentation, object tracking, sentiment/emotion labeling, and multilingual content classification. I ensured high-consistency outputs by following detailed annotation guidelines, running QA validations, and maintaining accuracy targets throughout large-scale annotation batches. The project supported training pipelines for CV, ASR, and LLM models across diverse real-world scenarios.