Machine Learning Data Labeler and AI Dataset Specialist
Generated and labeled instruction-response pairs for supervised fine-tuning datasets tailored to programming and coding models. Extracted C++ code samples from GitHub repositories, formatted datasets, handled deduplication, and performed quality control checks. Documented data sources and followed reproducible labeling workflows. • Worked extensively with code data and technical text • Ensured high-quality, curated data for machine learning pipelines • Utilized labeling and dataset preprocessing tools • Supported accuracy and adherence to detailed requirements