Low-resource Language AI Data Annotator (Yoruba corpus – Computational Linguistics Project)
Created and curated a structured corpus of oral Yoruba knowledge for computational linguistics-focused AI training data. Performed data annotation and collection for low-resource language model development, with an emphasis on oral and text-based source material. Applied emerging standards and best practices for multilingual AI data curation and SFT prompt tasks. • Designed annotation processes adapted from Toloka methodologies for precision. • Processed and structured diverse language data for model training sets. • Supported responsible and ethical AI data practices in underrepresented language contexts. • Promoted linguistic diversity in corpus construction for AI/NLP.