Multimodal Data Annotation for Machine Learning Models
I worked on a long-term multimodal AI training project that aimed to make it easier for machine learning systems to grasp text, voice, images, and videos. My job included marking up text for classification, sentiment, and named-entity recognition; labeling audio clips for transcription, speaker diarization, and acoustic event tagging; making bounding boxes and segmentation labels on images for object detection; and tagging video segments for action recognition and event tracking.