For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
B
Bevone Machini

Bevone Machini

Senior Python Developer and Test Automation Lead

Kenya flagNairobi, Kenya
$50.00/hrIntermediateAppenClickworkerData Annotation Tech

Key Skills

Software

AppenAppen
ClickworkerClickworker
Data Annotation TechData Annotation Tech
HiveMindHiveMind
Micro1
MindriftMindrift
OneFormaOneForma
RemotasksRemotasks
Scale AIScale AI
TolokaToloka
TelusTelus

Top Subject Matter

ai training
data annotation
ai generalist

Top Data Types

ImageImage
TextText
VideoVideo

Top Task Types

Data CollectionData Collection
TranscriptionTranscription
Red TeamingRed Teaming
Text GenerationText Generation
ClassificationClassification
SegmentationSegmentation
Text SummarizationText Summarization
Prompt + Response Writing (SFT)Prompt + Response Writing (SFT)

Freelancer Overview

Senior Python Developer and Test Automation Lead. Brings 6+ years of professional experience across complex professional workflows, research, and quality-focused execution. Education includes Bachelor of Science, Kirinyaga University (2024).

IntermediateEnglishSwahili

Labeling Experience

handshake

VideoTranscription
Scope Project Aether is a large-scale human data initiative focused on improving reasoning and instruction-following capabilities in frontier large language models. The project sits at the intersection of expert annotation and scalable data infrastructure, targeting advanced post-training pipelines (RLHF, GRPO, and iterative preference optimization). Aether addresses the growing need for high-signal, domain-diverse training data that goes beyond generic web corpora—specifically targeting complex reasoning traces, multi-step problem solving, and nuanced human preference modeling across technical and professional domains. Data Labeling Tasks Performed • Reasoning Trace Annotation: Experts decompose complex problems into step-by-step logical chains, flagging errors, unstated assumptions, and alternative solution paths. • Preference Pair Ranking: Human raters compare model-generated responses across dimensions including factual accuracy, coherence, helpfulness, and safety, producing ranked preference pairs for reward model training. • Instruction Refinement: Rewriting ambiguous or underspecified prompts to improve clarity and reduce variance in model outputs. • Domain-Specific Validation: Subject-matter experts (software engineers, scientists, legal professionals) verify technical correctness in specialized response sets. • Synthetic Data Verification: Auditing AI-generated training examples for hallucinations, logical inconsistencies, and distributional drift before inclusion in fine-tuning datasets. Project Size • Annotator Pool: 200+ vetted experts across 15+ domains (computer science, mathematics, medicine, law, finance) • Data Volume: ~500K labeled examples generated per quarter; ~2M preference pairs ranked to date • Geographic Coverage: Multilingual annotation spanning English, Spanish, Mandarin, and German to support global model deployment • Infrastructure: Custom labeling interface built on Handshake AI's internal platform with integrated quality dashboards and real-time inter-annotator agreement tracking Quality Measures • Inter-Annotator Agreement (IAA): Target Cohen's Kappa > 0.75 on all ranking tasks; weekly calibration sessions for edge-case alignment • Expert Credentialing: Domain-specific qualification exams with 85% pass threshold; continuous performance monitoring via gold-standard hidden tests • Audit Trail: Full provenance logging for every labeled example—annotator ID, time spent, revision history, and adjudication records • Bias & Fairness Audits: Quarterly demographic parity checks across gender, ethnicity, and geographic origin in preference distributions • Data Freshness Protocols: TTL (time-to-live) policies ensuring training data reflects current knowledge cutoffs; automatic deprecation of outdated factual claims

Scope Project Aether is a large-scale human data initiative focused on improving reasoning and instruction-following capabilities in frontier large language models. The project sits at the intersection of expert annotation and scalable data infrastructure, targeting advanced post-training pipelines (RLHF, GRPO, and iterative preference optimization). Aether addresses the growing need for high-signal, domain-diverse training data that goes beyond generic web corpora—specifically targeting complex reasoning traces, multi-step problem solving, and nuanced human preference modeling across technical and professional domains. Data Labeling Tasks Performed • Reasoning Trace Annotation: Experts decompose complex problems into step-by-step logical chains, flagging errors, unstated assumptions, and alternative solution paths. • Preference Pair Ranking: Human raters compare model-generated responses across dimensions including factual accuracy, coherence, helpfulness, and safety, producing ranked preference pairs for reward model training. • Instruction Refinement: Rewriting ambiguous or underspecified prompts to improve clarity and reduce variance in model outputs. • Domain-Specific Validation: Subject-matter experts (software engineers, scientists, legal professionals) verify technical correctness in specialized response sets. • Synthetic Data Verification: Auditing AI-generated training examples for hallucinations, logical inconsistencies, and distributional drift before inclusion in fine-tuning datasets. Project Size • Annotator Pool: 200+ vetted experts across 15+ domains (computer science, mathematics, medicine, law, finance) • Data Volume: ~500K labeled examples generated per quarter; ~2M preference pairs ranked to date • Geographic Coverage: Multilingual annotation spanning English, Spanish, Mandarin, and German to support global model deployment • Infrastructure: Custom labeling interface built on Handshake AI's internal platform with integrated quality dashboards and real-time inter-annotator agreement tracking Quality Measures • Inter-Annotator Agreement (IAA): Target Cohen's Kappa > 0.75 on all ranking tasks; weekly calibration sessions for edge-case alignment • Expert Credentialing: Domain-specific qualification exams with 85% pass threshold; continuous performance monitoring via gold-standard hidden tests • Audit Trail: Full provenance logging for every labeled example—annotator ID, time spent, revision history, and adjudication records • Bias & Fairness Audits: Quarterly demographic parity checks across gender, ethnicity, and geographic origin in preference distributions • Data Freshness Protocols: TTL (time-to-live) policies ensuring training data reflects current knowledge cutoffs; automatic deprecation of outdated factual claims

2026 - 2026

handshake

ImageText Generation
Scope Project Aether is a large-scale human data initiative focused on improving reasoning and instruction-following capabilities in frontier large language models. The project sits at the intersection of expert annotation and scalable data infrastructure, targeting advanced post-training pipelines (RLHF, GRPO, and iterative preference optimization). Aether addresses the growing need for high-signal, domain-diverse training data that goes beyond generic web corpora—specifically targeting complex reasoning traces, multi-step problem solving, and nuanced human preference modeling across technical and professional domains. Data Labeling Tasks Performed • Reasoning Trace Annotation: Experts decompose complex problems into step-by-step logical chains, flagging errors, unstated assumptions, and alternative solution paths. • Preference Pair Ranking: Human raters compare model-generated responses across dimensions including factual accuracy, coherence, helpfulness, and safety, producing ranked preference pairs for reward model training. • Instruction Refinement: Rewriting ambiguous or underspecified prompts to improve clarity and reduce variance in model outputs. • Domain-Specific Validation: Subject-matter experts (software engineers, scientists, legal professionals) verify technical correctness in specialized response sets. • Synthetic Data Verification: Auditing AI-generated training examples for hallucinations, logical inconsistencies, and distributional drift before inclusion in fine-tuning datasets. Project Size • Annotator Pool: 200+ vetted experts across 15+ domains (computer science, mathematics, medicine, law, finance) • Data Volume: ~500K labeled examples generated per quarter; ~2M preference pairs ranked to date • Geographic Coverage: Multilingual annotation spanning English, Spanish, Mandarin, and German to support global model deployment • Infrastructure: Custom labeling interface built on Handshake AI's internal platform with integrated quality dashboards and real-time inter-annotator agreement tracking Quality Measures • Inter-Annotator Agreement (IAA): Target Cohen's Kappa > 0.75 on all ranking tasks; weekly calibration sessions for edge-case alignment • Expert Credentialing: Domain-specific qualification exams with 85% pass threshold; continuous performance monitoring via gold-standard hidden tests • Audit Trail: Full provenance logging for every labeled example—annotator ID, time spent, revision history, and adjudication records • Bias & Fairness Audits: Quarterly demographic parity checks across gender, ethnicity, and geographic origin in preference distributions • Data Freshness Protocols: TTL (time-to-live) policies ensuring training data reflects current knowledge cutoffs; automatic deprecation of outdated factual claims

Scope Project Aether is a large-scale human data initiative focused on improving reasoning and instruction-following capabilities in frontier large language models. The project sits at the intersection of expert annotation and scalable data infrastructure, targeting advanced post-training pipelines (RLHF, GRPO, and iterative preference optimization). Aether addresses the growing need for high-signal, domain-diverse training data that goes beyond generic web corpora—specifically targeting complex reasoning traces, multi-step problem solving, and nuanced human preference modeling across technical and professional domains. Data Labeling Tasks Performed • Reasoning Trace Annotation: Experts decompose complex problems into step-by-step logical chains, flagging errors, unstated assumptions, and alternative solution paths. • Preference Pair Ranking: Human raters compare model-generated responses across dimensions including factual accuracy, coherence, helpfulness, and safety, producing ranked preference pairs for reward model training. • Instruction Refinement: Rewriting ambiguous or underspecified prompts to improve clarity and reduce variance in model outputs. • Domain-Specific Validation: Subject-matter experts (software engineers, scientists, legal professionals) verify technical correctness in specialized response sets. • Synthetic Data Verification: Auditing AI-generated training examples for hallucinations, logical inconsistencies, and distributional drift before inclusion in fine-tuning datasets. Project Size • Annotator Pool: 200+ vetted experts across 15+ domains (computer science, mathematics, medicine, law, finance) • Data Volume: ~500K labeled examples generated per quarter; ~2M preference pairs ranked to date • Geographic Coverage: Multilingual annotation spanning English, Spanish, Mandarin, and German to support global model deployment • Infrastructure: Custom labeling interface built on Handshake AI's internal platform with integrated quality dashboards and real-time inter-annotator agreement tracking Quality Measures • Inter-Annotator Agreement (IAA): Target Cohen's Kappa > 0.75 on all ranking tasks; weekly calibration sessions for edge-case alignment • Expert Credentialing: Domain-specific qualification exams with 85% pass threshold; continuous performance monitoring via gold-standard hidden tests • Audit Trail: Full provenance logging for every labeled example—annotator ID, time spent, revision history, and adjudication records • Bias & Fairness Audits: Quarterly demographic parity checks across gender, ethnicity, and geographic origin in preference distributions • Data Freshness Protocols: TTL (time-to-live) policies ensuring training data reflects current knowledge cutoffs; automatic deprecation of outdated factual claims

2025 - 2026

Education

K

Kirinyaga University

Bachelor of Science, Computer Science

Bachelor of Science
2024 - 2024

Work History

L

Lightstorm Networks

Senior Python Developer and Test Automation Lead

Nairobi
2021 - Present
D

DataPulse Innovations

Full-Stack Software Engineer

Nairobi
2024 - 2025