Colinz Yegon - Expert in AI computer vision data labeling for self-driving cars

Key Skills

Software

CloudFactory

CrowdSource

CVAT

Data Annotation Tech

Img Lab

Labelbox

LabelImg

OpenCV AI Kit (OAK)

Sama

Internal/Proprietary Tooling

Appen

Top Subject Matter

No subject matter listed

Top Data Types

Audio

Image

Text

Top Task Types

Bounding Box

Classification

Computer Programming Coding

Segmentation

Translation Localization

Freelancer Overview

I am a skilled data annotation specialist with over two years of experience in speech transcription, linguistic data labeling, and AI training across English and Swahili datasets. My work has involved transcribing and cleaning audio data for use in training speech recognition models, with a strong emphasis on phonetic accuracy, grammatical correction, and adherence to strict transcription guidelines. I have collaborated on projects through platforms like TranscribeMe, Rev, and Sama, focusing on non-native English speech and multilingual corpora. What sets me apart is my bilingual proficiency (Swahili-English), advanced linguistic training, and hands-on experience with large-scale AI data projects. I possess a keen ear for diverse accents, strong attention to detail, and a deep understanding of how high-quality labeled data directly impacts model performance. Whether working independently or in a team setting, I consistently meet quality benchmarks and tight deadlines.

ExpertSwahiliFrenchGermanEnglish

Labeling Experience

English & Swahili Audio Transcription and Annotation – Appen & TranscribeMe

AppenAudioClassificationQuestion Answering

Worked on multiple projects transcribing and annotating English and Swahili speech datasets for training AI speech recognition and NLP systems. Tasks involved listening to short voice recordings, accurately transcribing content, removing disfluencies, correcting grammar, and classifying speaker accents and background noise levels. I followed detailed transcription and annotation guidelines to ensure consistency across large datasets. Quality control involved peer review and automated QA systems, with a maintained accuracy rate of over 98%. Some tasks included rating ASR model outputs for fluency and comprehension. Project size included processing over 1,000 audio clips weekly, totaling over 30,000 clips throughout the engagement.

2022 - 2022

Education

K

Kenyatta University

Bachelor of Arts in Linguistics, Linguistics and Language Studies

Bachelor of Arts in Linguistics

2015 - 2019

K

Kenyatta University

Bachelor of Arts in Linguistics, Linguistics and Language Studies

Bachelor of Arts in Linguistics

2015 - 2019

Work History

S

Sama AI

Data Annotation Assistant

Nairobi

2020 - Present