For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Gathungu Darias24

Gathungu Darias24

AI Training Contributor - Large Language Models

KENYA flag
Nairobi, Kenya
$20.00/hrExpertMindriftSnorkel AIAws Sagemaker

Key Skills

Software

MindriftMindrift
Snorkel AISnorkel AI
AWS SageMakerAWS SageMaker
Scale AIScale AI

Top Subject Matter

No subject matter listed

Top Data Types

3D Sensor
AudioAudio
Computer Code ProgrammingComputer Code Programming
DocumentDocument
Geospatial Tiled ImageryGeospatial Tiled Imagery
ImageImage
TextText

Top Label Types

Action Recognition
Audio Recording
Classification
Computer Programming Coding
Data Collection
Evaluation Rating
Fine Tuning
Land Cover Classification
Red Teaming
RLHF
Routing

Freelancer Overview

I specialize in AI training data evaluation, annotation, and prompt engineering, with hands-on experience in Reinforcement Learning from Human Feedback (RLHF) and large language model (LLM) output analysis. My expertise includes ranking and grading AI-generated responses for factual accuracy, reasoning quality, and policy compliance, as well as identifying hallucinations and edge-case failures to ensure model reliability. I am skilled in Python automation, structured data validation (JSON Schema), and have developed tools for batch-testing, scoring, and consistency analysis of model outputs. My background in full-stack development and REST API design supports my ability to streamline annotation workflows and maintain high data integrity. I am passionate about improving AI alignment and safety through rigorous evaluation, clear documentation, and continuous process optimization.

ExpertEnglish

Labeling Experience

Scale AI

Structured Output Validation Initiative

Scale AIDocumentRLHFFine Tuning
I validated JSON-formatted outputs from LLMs against explicit schema and formatting requirements. By identifying inconsistencies and logging errors, I supported the training feedback loop for improved structured output reliability. My validation efforts have reduced logical and formatting errors in the text generation process. • Cross-checked schema compliance for structured outputs. • Logged errors for training and QA feedback purposes. • Flagged formatting inconsistencies and logical errors in outputs. • Supported data team with structured logs to inform ongoing refinement.

I validated JSON-formatted outputs from LLMs against explicit schema and formatting requirements. By identifying inconsistencies and logging errors, I supported the training feedback loop for improved structured output reliability. My validation efforts have reduced logical and formatting errors in the text generation process. • Cross-checked schema compliance for structured outputs. • Logged errors for training and QA feedback purposes. • Flagged formatting inconsistencies and logical errors in outputs. • Supported data team with structured logs to inform ongoing refinement.

2024
AWS SageMaker

AI Safety Review and Moderation Task

Aws SagemakerAudioRLHFFine Tuning
For this specialized review, I analyzed LLM-generated responses to identify bias, harmful content, and policy violations. I escalated ambiguous or edge cases via structured reporting and proposed alternative safer phrasings. My work contributed directly to refining safety alignment and reducing risk in AI-generated instructions. • Conducted comprehensive bias and safety review of LLM outputs. • Flagged responses breaching safety and policy standards. • Utilized structured reports for borderline and ambiguous cases. • Proposed safer alternatives to align outputs with content policies.

For this specialized review, I analyzed LLM-generated responses to identify bias, harmful content, and policy violations. I escalated ambiguous or edge cases via structured reporting and proposed alternative safer phrasings. My work contributed directly to refining safety alignment and reducing risk in AI-generated instructions. • Conducted comprehensive bias and safety review of LLM outputs. • Flagged responses breaching safety and policy standards. • Utilized structured reports for borderline and ambiguous cases. • Proposed safer alternatives to align outputs with content policies.

2024
Snorkel AI

LLM Comparative Ranking Project

Snorkel AITextFine TuningEvaluation Rating
I ranked multiple LLM candidate responses using strict quality criteria and provided justification for all scoring decisions. Consistency across evaluation batches was a key focus, ensuring robust feedback to the training data pipeline. I produced concise rationales for each comparative evaluation to calibrate language model ranking. • Used predefined ranking rubric for all evaluations. • Documented reasoning for score assignments. • Contributed to batch-to-batch consistency in annotation quality. • Ensured clear and logical comparative analysis for training purposes.

I ranked multiple LLM candidate responses using strict quality criteria and provided justification for all scoring decisions. Consistency across evaluation batches was a key focus, ensuring robust feedback to the training data pipeline. I produced concise rationales for each comparative evaluation to calibrate language model ranking. • Used predefined ranking rubric for all evaluations. • Documented reasoning for score assignments. • Contributed to batch-to-batch consistency in annotation quality. • Ensured clear and logical comparative analysis for training purposes.

2024
Mindrift

AI Data Labeler / Training Contributor (Contract)

MindriftImageRoutingLand Cover Classification
In this role, I evaluated and annotated AI-generated text responses for accuracy, coherence, and adherence to instructions. I used structured rubrics to rank Large Language Model (LLM) outputs and identify hallucinations or logical inconsistency. I ensured outputs were aligned with AI safety standards and improved low-quality responses for clarity and compliance. • Applied structured grading rubrics to assess correctness, reasoning depth, formatting, and safety alignment. • Flagged and documented edge-case failures to improve dataset quality. • Maintained high QA approval across high-volume annotation workflows. • Contributed to the refinement of instructional clarity to reduce ambiguity.

In this role, I evaluated and annotated AI-generated text responses for accuracy, coherence, and adherence to instructions. I used structured rubrics to rank Large Language Model (LLM) outputs and identify hallucinations or logical inconsistency. I ensured outputs were aligned with AI safety standards and improved low-quality responses for clarity and compliance. • Applied structured grading rubrics to assess correctness, reasoning depth, formatting, and safety alignment. • Flagged and documented edge-case failures to improve dataset quality. • Maintained high QA approval across high-volume annotation workflows. • Contributed to the refinement of instructional clarity to reduce ambiguity.

2024

Education

U

University of Texas at Dallas

Bachelor of Science, Information Technology

Bachelor of Science
2022 - 2022

Work History

I

Independent Contractor

Full-Stack Developer

Austin
2023 - Present