Data Labeling & Quality Assurance for Machine Learning Models
This project encompassed the preparation and validation of approximately 5,000+ text entries and 200+ Python and SQL code snippets for machine learning model training during my Computer Science coursework at the University of Nairobi, supplemented by independent freelance work. Specific data labeling tasks included text classification by sentiment and intent, fact-checking statements against source documents, bilingual annotation and localization of over 1,000 English-Swahili translation pairs for East African audiences, and code review to identify logic errors and edge cases in AI-generated outputs. Quality measures strictly adhered to included detailed rubric compliance, a two-pass self-review verification protocol, native-speaker cultural validation for Swahili content, consistent labeling conventions across all entries, and flagging ambiguous data points for escalation rather than arbitrary judgment—resulting in a maintained personal accuracy rate of 98%+ across all annotated datasets.