Data collection
I have over two years of hands-on experience in both data collection and data labeling for AI training projects. On the collection side, I have gathered raw images, text, and audio from various sources while ensuring proper consent, privacy compliance, and data diversity. For example, I led a field data collection effort for a retail inventory system, capturing 5,000+ product images under different lighting and angles. On the labeling side, I have annotated image bounding boxes, semantic segmentation masks, named entities in text, and audio transcriptions with high consistency. Across multiple projects, I maintained an inter-annotator agreement rate above 90% by following strict guidelines and flagging ambiguous cases for review. What sets me apart is my ability to connect collection and labeling into an efficient pipeline. I proactively clean and pre-check raw data to reduce labeling errors, and I have suggested improvements to collection protocols when certain label types prove difficult. I am proficient with tools like Labelbox, CVAT, and custom spreadsheets, and I have basic Python skills to rename files, check label formats, and automate quality reports. My combined experience in data collection and labeling means I understand the full lifecycle of training data—from sourcing raw inputs to delivering high-quality labeled datasets that directly improve model performance.