Multi-Modal AI Data Annotation Specialist – Computer Vision & Speech Models
Led multi-modal data annotation projects supporting machine learning models in computer vision and speech recognition. Annotated 250,000+ images and video frames using bounding boxes, polygon segmentation, object tracking, and key point labeling for object detection and action recognition systems. Performed 5,000+ hours of audio transcription, speaker diarization, emotion recognition, and NLP entity tagging for LLM and conversational AI training. Maintained 98%+ annotation accuracy through structured QA audits, guideline compliance, inter-annotator agreement checks, and bias reduction practices. Collaborated closely with ML engineers to refine labeling taxonomies and improve dataset performance.