Bevone Machini - Senior Python Developer and Test Automation Lead

Key Skills

Software

Appen

Clickworker

Data Annotation Tech

HiveMind

Micro1

Mindrift

OneForma

Remotasks

Scale AI

Toloka

Telus

Top Subject Matter

ai training

data annotation

ai generalist

Top Data Types

Image

Text

Video

Top Task Types

Data Collection

Transcription

Red Teaming

Text Generation

Classification

Segmentation

Text Summarization

Prompt + Response Writing (SFT)

Freelancer Overview

Senior Python Developer and Test Automation Lead. Brings 6+ years of professional experience across complex professional workflows, research, and quality-focused execution. Education includes Bachelor of Science, Kirinyaga University (2024).

IntermediateEnglishSwahili

Labeling Experience

handshake

VideoTranscription

Scope Project Aether is a large-scale human data initiative focused on improving reasoning and instruction-following capabilities in frontier large language models. The project sits at the intersection of expert annotation and scalable data infrastructure, targeting advanced post-training pipelines (RLHF, GRPO, and iterative preference optimization). Aether addresses the growing need for high-signal, domain-diverse training data that goes beyond generic web corpora—specifically targeting complex reasoning traces, multi-step problem solving, and nuanced human preference modeling across technical and professional domains. Data Labeling Tasks Performed • Reasoning Trace Annotation: Experts decompose complex problems into step-by-step logical chains, flagging errors, unstated assumptions, and alternative solution paths. • Preference Pair Ranking: Human raters compare model-generated responses across dimensions including factual accuracy, coherence, helpfulness, and safety, producing ranked preference pairs for reward model training. • Instruction Refinement: Rewriting ambiguous or underspecified prompts to improve clarity and reduce variance in model outputs. • Domain-Specific Validation: Subject-matter experts (software engineers, scientists, legal professionals) verify technical correctness in specialized response sets. • Synthetic Data Verification: Auditing AI-generated training examples for hallucinations, logical inconsistencies, and distributional drift before inclusion in fine-tuning datasets. Project Size • Annotator Pool: 200+ vetted experts across 15+ domains (computer science, mathematics, medicine, law, finance) • Data Volume: ~500K labeled examples generated per quarter; ~2M preference pairs ranked to date • Geographic Coverage: Multilingual annotation spanning English, Spanish, Mandarin, and German to support global model deployment • Infrastructure: Custom labeling interface built on Handshake AI's internal platform with integrated quality dashboards and real-time inter-annotator agreement tracking Quality Measures • Inter-Annotator Agreement (IAA): Target Cohen's Kappa > 0.75 on all ranking tasks; weekly calibration sessions for edge-case alignment • Expert Credentialing: Domain-specific qualification exams with 85% pass threshold; continuous performance monitoring via gold-standard hidden tests • Audit Trail: Full provenance logging for every labeled example—annotator ID, time spent, revision history, and adjudication records • Bias & Fairness Audits: Quarterly demographic parity checks across gender, ethnicity, and geographic origin in preference distributions • Data Freshness Protocols: TTL (time-to-live) policies ensuring training data reflects current knowledge cutoffs; automatic deprecation of outdated factual claims

2026 - 2026

handshake

ImageText Generation

Scope Project Aether is a large-scale human data initiative focused on improving reasoning and instruction-following capabilities in frontier large language models. The project sits at the intersection of expert annotation and scalable data infrastructure, targeting advanced post-training pipelines (RLHF, GRPO, and iterative preference optimization). Aether addresses the growing need for high-signal, domain-diverse training data that goes beyond generic web corpora—specifically targeting complex reasoning traces, multi-step problem solving, and nuanced human preference modeling across technical and professional domains. Data Labeling Tasks Performed • Reasoning Trace Annotation: Experts decompose complex problems into step-by-step logical chains, flagging errors, unstated assumptions, and alternative solution paths. • Preference Pair Ranking: Human raters compare model-generated responses across dimensions including factual accuracy, coherence, helpfulness, and safety, producing ranked preference pairs for reward model training. • Instruction Refinement: Rewriting ambiguous or underspecified prompts to improve clarity and reduce variance in model outputs. • Domain-Specific Validation: Subject-matter experts (software engineers, scientists, legal professionals) verify technical correctness in specialized response sets. • Synthetic Data Verification: Auditing AI-generated training examples for hallucinations, logical inconsistencies, and distributional drift before inclusion in fine-tuning datasets. Project Size • Annotator Pool: 200+ vetted experts across 15+ domains (computer science, mathematics, medicine, law, finance) • Data Volume: ~500K labeled examples generated per quarter; ~2M preference pairs ranked to date • Geographic Coverage: Multilingual annotation spanning English, Spanish, Mandarin, and German to support global model deployment • Infrastructure: Custom labeling interface built on Handshake AI's internal platform with integrated quality dashboards and real-time inter-annotator agreement tracking Quality Measures • Inter-Annotator Agreement (IAA): Target Cohen's Kappa > 0.75 on all ranking tasks; weekly calibration sessions for edge-case alignment • Expert Credentialing: Domain-specific qualification exams with 85% pass threshold; continuous performance monitoring via gold-standard hidden tests • Audit Trail: Full provenance logging for every labeled example—annotator ID, time spent, revision history, and adjudication records • Bias & Fairness Audits: Quarterly demographic parity checks across gender, ethnicity, and geographic origin in preference distributions • Data Freshness Protocols: TTL (time-to-live) policies ensuring training data reflects current knowledge cutoffs; automatic deprecation of outdated factual claims

2025 - 2026

Education

K

Kirinyaga University

Bachelor of Science, Computer Science

Bachelor of Science

2024 - 2024

Work History

L

Lightstorm Networks

Senior Python Developer and Test Automation Lead

Nairobi

2021 - Present

D

DataPulse Innovations

Full-Stack Software Engineer

Nairobi

2024 - 2025