Collins Chumba - AI Data Labeling Specialist with Expertise in Text, Image, and QA Tasks

Key Skills

Software

Appen

Clickworker

Labelbox

Lionbridge

Remotasks

Scale AI

Telus

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code Programming

Image

Text

Top Task Types

Computer Programming Coding

Data Collection

Evaluation Rating

Prompt Response Writing SFT

Transcription

Freelancer Overview

I am an AI training and data labeling specialist with over three years of hands-on experience working on large-scale LLM projects across remote platforms such as Remotasks, Appen, and Clickworker. My work has focused on evaluating and annotating text, code, and audio data to improve model accuracy, consistency, and safety. I have strong experience with RLHF workflows, prompt evaluation, and identifying issues like hallucinations, logical errors, and policy violations, while consistently maintaining quality scores above 98%. I bring a solid computer science background with practical skills in Python, data analysis, and automation, which helps me approach annotation tasks with both technical and analytical precision. I have also worked on linguistic annotation and localization for English and Swahili, ensuring clarity, cultural relevance, and semantic accuracy. I am comfortable managing high-volume tasks independently, following strict guidelines, and collaborating with distributed teams to meet tight deadlines while delivering reliable, high-quality training data.

IntermediateSwahiliEnglish

Labeling Experience

Image Annotation for Autonomous Vehicle Training

TelusImageBounding BoxSegmentation

Worked on a large-scale image annotation project for autonomous vehicle datasets. Tasks included identifying and labeling objects such as pedestrians, vehicles, traffic signs, and road markings using bounding boxes and segmentation masks. Ensured high-quality, consistent annotations across thousands of images, following strict labeling guidelines and quality checks to maintain dataset accuracy for machine learning model training. Collaborated with a team to optimize labeling efficiency while maintaining 99% accuracy standards.

2024 - 2025

LLM Evaluation and RLHF Text Annotation Project

AppenTextText GenerationText Summarization

Contributed to multiple large-scale LLM training projects focused exclusively on text data. Tasks included evaluating and rating model responses using detailed quality rubrics, ranking outputs for RLHF workflows, and writing high-quality prompts and responses for supervised fine-tuning. Performed text classification, question answering validation, summarization review, and red teaming to identify factual errors, hallucinations, and safety issues. Consistently maintained accuracy and quality scores above 98 percent while following strict annotation guidelines and confidentiality standards.

2022 - 2023

Education

D

Dedan Kimathi University of Technology

Bachelor of Science, Computer Science

Bachelor of Science

2017 - 2021

Work History

S

Safaricom

Data Support & Automation Specialist

Nairobi

2020 - 2021