For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Dancan Wanyonyi

Dancan Wanyonyi

LLM Evaluator

KENYA flag
Nairobi, Kenya
$30.00/hrExpertScale AIRemotasksCrowdsource

Key Skills

Software

Scale AIScale AI
RemotasksRemotasks
CrowdSourceCrowdSource
TelusTelus
AppenAppen
CloudFactoryCloudFactory
ClickworkerClickworker
Data Annotation TechData Annotation Tech

Top Subject Matter

Conversational AI
Llms Domain Expertise
Prompt Evaluation

Top Data Types

TextText
ImageImage
DocumentDocument

Top Task Types

RLHF
Bounding Box
Object Detection
Segmentation
Text Generation
Text Summarization
Data Collection
Transcription
Prompt Response Writing SFT
Evaluation Rating

Freelancer Overview

LLM Evaluator. Brings 7+ years of professional experience across legal operations, contract review, compliance, and structured analysis. Core strengths include Scale AI and Remotasks. Education includes Bachelor of Arts, University of Nairobi (2020). AI-training focus includes data types such as Text and Image and labeling workflows including RLHF, Bounding Box, and Object Detection.

ExpertEnglishSwahiliItalian

Labeling Experience

Scale AI

LLM Evaluator

Scale AITextRLHF
As an LLM Evaluator, I contributed to the training and evaluation of large language models. My work involved labeling and categorizing text data, ranking model outputs, and performing quality assurance for RLHF pipelines. I developed prompt datasets to enhance the performance and safety of conversational AI systems. • Conducted LLM prompt evaluation for reasoning, bias, and hallucinations. • Labeled, classified, and ranked large volumes of text outputs. • Performed multi-stage QA checks to maintain accuracy above 98%. • Created and curated datasets to support LLM fine-tuning and evaluation.

As an LLM Evaluator, I contributed to the training and evaluation of large language models. My work involved labeling and categorizing text data, ranking model outputs, and performing quality assurance for RLHF pipelines. I developed prompt datasets to enhance the performance and safety of conversational AI systems. • Conducted LLM prompt evaluation for reasoning, bias, and hallucinations. • Labeled, classified, and ranked large volumes of text outputs. • Performed multi-stage QA checks to maintain accuracy above 98%. • Created and curated datasets to support LLM fine-tuning and evaluation.

2020 - Present
Remotasks

Data Annotation Specialist – Autonomous Systems

RemotasksImageBounding Box
As a Data Annotation Specialist in autonomous systems, I annotated diverse driving datasets to train perception models. My tasks included bounding box annotation, segmentation, tracking across frames, and LiDAR point cloud labeling. I focused on improving dataset quality and provided feedback to optimize annotation workflows. • Annotated vehicles, pedestrians, and traffic signs using bounding boxes. • Executed semantic segmentation and object tracking for road scenes. • Labeled 3D LiDAR point clouds for spatial understanding. • Helped reduce mislabeling errors and increased annotation throughput.

As a Data Annotation Specialist in autonomous systems, I annotated diverse driving datasets to train perception models. My tasks included bounding box annotation, segmentation, tracking across frames, and LiDAR point cloud labeling. I focused on improving dataset quality and provided feedback to optimize annotation workflows. • Annotated vehicles, pedestrians, and traffic signs using bounding boxes. • Executed semantic segmentation and object tracking for road scenes. • Labeled 3D LiDAR point clouds for spatial understanding. • Helped reduce mislabeling errors and increased annotation throughput.

2017 - 2019
Scale AI

LLM Prompt Evaluation & Safety Testing

Scale AIText
For the LLM Prompt Evaluation & Safety Testing project, I performed safety scoring and qualitative evaluations on model-generated outputs. The responsibilities included judging for toxicity, bias, factuality, and reasoning quality to guide LLM improvements. I also developed prompts to test models in real-world scenarios and assess conversational coherence. • Scored model outputs for various safety and reasoning metrics. • Developed and utilized 300+ prompts for evaluation. • Contributed to safer, more accurate LLM deployment. • Assessed conversational capabilities in diverse contexts.

For the LLM Prompt Evaluation & Safety Testing project, I performed safety scoring and qualitative evaluations on model-generated outputs. The responsibilities included judging for toxicity, bias, factuality, and reasoning quality to guide LLM improvements. I also developed prompts to test models in real-world scenarios and assess conversational coherence. • Scored model outputs for various safety and reasoning metrics. • Developed and utilized 300+ prompts for evaluation. • Contributed to safer, more accurate LLM deployment. • Assessed conversational capabilities in diverse contexts.

Not specified
Remotasks

Autonomous Driving Dataset Labeling Project

RemotasksImageObject Detection
During an autonomous driving dataset labeling project, I contributed to annotating over 10,000 frames for perception models. The project involved identifying and labeling objects, lanes, and road features to support model training. My work helped improve object detection accuracy for autonomous systems. • Labeled large-scale autonomous driving data for machine learning. • Focused on improving object, lane, and road feature detection. • Supported accurate perception model development for safety applications. • Contributed to a dataset used for object detection performance gains.

During an autonomous driving dataset labeling project, I contributed to annotating over 10,000 frames for perception models. The project involved identifying and labeling objects, lanes, and road features to support model training. My work helped improve object detection accuracy for autonomous systems. • Labeled large-scale autonomous driving data for machine learning. • Focused on improving object, lane, and road feature detection. • Supported accurate perception model development for safety applications. • Contributed to a dataset used for object detection performance gains.

Not specified

Education

U

University of Nairobi

Bachelor of Arts, Geography and Environmental Studies

Bachelor of Arts
2020 - 2020

Work History

R

Remote | March

CrowdGen

Location not specified
2020 - Present
C

Clickworker

Data Annotation

New York
2020 - 2025