Anthony Ndegwa Mathenge - AI Data Trainer & Annotation Specialist at Remotasks / Scale AI (Remote Contract)

Key Skills

Software

Scale AI

Appen

Data Annotation Tech

LabelImg

Labelbox

Label Studio

Top Subject Matter

Llms Domain Expertise

Natural Language Processing

AI Model Training

Top Data Types

Text

Image

Document

Top Task Types

RLHF

Classification

Prompt Response Writing SFT

Freelancer Overview

AI Data Trainer and Annotation Specialist with nearly 5 years of hands-on experience producing high-quality labeled datasets for large language model training and fine-tuning. I have completed 50,000+ annotation tasks across text, image, and multimodal datasets through platforms including Scale AI and Appen, maintaining a 97%+ quality accuracy rate and achieving Expert Tier contributor status. My experience spans RLHF workflows, model output evaluation, prompt engineering, named entity recognition, sentiment analysis, and search relevance judgment giving me a well-rounded foundation across the full spectrum of AI training data tasks. What sets me apart is my Computer Science background (B.S., University of Alabama, 2020) combined with practical annotation expertise and a strong commitment to AI safety and responsible development. I have independently curated instruction-response datasets, conducted adversarial prompt evaluations to surface model biases, and contributed to guideline development that improved team-wide annotation consistency. I bring both the technical literacy to understand what models need and the careful human judgment required to deliver training data that is accurate, helpful, and safe.

ExpertEnglish

Labeling Experience

AI Data Trainer & Annotation Specialist at Remotasks / Scale AI (Remote Contract)

Scale AITextRLHF

As an AI Data Trainer & Annotation Specialist, I labeled and annotated over 50,000 data points in text, image, and multimodal datasets for large language model training. I actively participated in RLHF workflows by rating and revising AI-generated responses for accuracy, helpfulness, safety, and coherence. My responsibilities also included crafting prompts for model evaluation and maintaining exceptional quality standards. • Consistently exceeded a 97%+ accuracy rate and achieved Expert Tier status on the Scale AI platform. • Documented edge cases to improve annotation guidelines for the team. • Reviewed and audited peer annotations to uphold distributed team quality. • Used multiple annotation platforms including Scale AI and Remotasks.

2022 - Present

Data Annotation Associate at Appen (Remote Contract)

AppenImageClassification

As a Data Annotation Associate, I completed image classification, text annotation, and NER tasks for AI datasets across multiple projects. I evaluated search engine result relevance and ranked web content to improve AI model quality. Linguistic expertise in English enabled me to annotate multilingual datasets for cross-lingual training. • Completed image classification and NER for over 10 active projects. • Ranked search results to support search AI model training. • Achieved top-rated contractor status within first 60 days. • Used Appen platform for diverse annotation assignments.

2021 - 2021

Search Relevance Annotation (Personal Project)

Text

I independently executed over 2,000 search relevance judgments using a structured four-point scale. My process included the development of a personal annotation guide to ensure consistency and reliability. The project supported AI-powered search engine model improvements. • Applied four-point scale for search result annotation. • Created detailed annotation guide for personal use. • Contributed to improved search engine model evaluation. • Drove consistent and replicable labeling standards.

Not specified

AI Bias & Safety Evaluation (Independent Project)

Text

I evaluated model outputs across over 500 adversarial prompts to identify and document bias, hallucinations, and unsafe completions. Findings were compiled into a structured report with actionable recommendations for mitigation. The evaluation contributed to enhanced model safety practices. • Systematically tested model robustness using adversarial inputs. • Recorded and outlined diverse safety failure cases. • Developed actionable mitigation suggestions for engineering teams. • Supported bias reduction and model improvement initiatives.

Not specified

Instruction-Response Dataset Curation (Independent Project)

TextPrompt Response Writing SFT

I curated a prompt-response dataset of 10,000 records for LLM fine-tuning using structured rubrics. This involved assessment of factuality, helpfulness, safety, and adherence to instruction mirroring RLHF processes. The project required depth in both prompt creation and systematic evaluation. • Structured dataset for industry-aligned RLHF workflows. • Assessed model responses with detailed rubrics. • Focused on LLM fine-tuning for improved safety and utility. • Applied knowledge of evaluation and prompt engineering.

Not specified

Education

U

University of Alabama

Bachelor of Science, Computer Science

Bachelor of Science

2016 - 2020

Work History

T

Tech Solutions Group

Junior Data Analyst

Montgomery

2020 - 2021