For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Genson Kithome

Genson Kithome

Data Analyst & AI Data Specialist - SQL, Python, BI & Structured Data

KENYA flag
Meru, Kenya
$10.00/hrIntermediateMercorScale AISuperannotate

Key Skills

Software

MercorMercor
Scale AIScale AI
SuperAnnotateSuperAnnotate
Other

Top Subject Matter

No subject matter listed

Top Data Types

AudioAudio
Computer Code ProgrammingComputer Code Programming
ImageImage
TextText

Top Label Types

Audio Recording
Bounding Box
Classification
Computer Programming Coding
Function Calling
Question Answering
Segmentation
Text Generation

Freelancer Overview

I am a data analyst and AI training data specialist with experience working on large-scale annotation and data validation projects across computer vision, speech datasets, and structured data environments. My work has included image and object annotation for computer vision models, scripted speech data collection for voice recognition systems, and scientific data formatting using LaTeX for chemistry datasets. I have contributed to projects on platforms such as Remotasks, Atlas Capture, and Populii, consistently adhering to strict annotation guidelines and quality control standards. Alongside annotation work, I bring strong analytical skills using Excel, Google Sheets, SQL, Python, and R to understand structured datasets and ensure logical consistency in training data. I also work with AI systems through prompt engineering and multimodal dataset tasks involving text, image, and audio data. My background in data analytics allows me to approach labeling tasks with a deeper focus on accuracy, structure, and data quality, particularly in domains such as technology, agritech, and financial datasets.

IntermediateEnglishSwahili

Labeling Experience

Speech Data Collection and Scripted Audio Recording for AI Training (Atlas Capture)

OtherAudioQuestion AnsweringAudio Recording
This project involved recording scripted speech samples using the Atlas Capture platform to support the development of speech recognition and voice-based AI systems. Participants recorded clearly articulated audio while reading predefined scripts designed to capture variations in pronunciation, pacing, and natural speech patterns. Quality standards required recordings to meet strict guidelines for clarity, background noise control, and accurate script delivery. Each submission was reviewed for audio quality and adherence to recording instructions to ensure the dataset provided reliable training data for automatic speech recognition and voice interaction models.

This project involved recording scripted speech samples using the Atlas Capture platform to support the development of speech recognition and voice-based AI systems. Participants recorded clearly articulated audio while reading predefined scripts designed to capture variations in pronunciation, pacing, and natural speech patterns. Quality standards required recordings to meet strict guidelines for clarity, background noise control, and accurate script delivery. Each submission was reviewed for audio quality and adherence to recording instructions to ensure the dataset provided reliable training data for automatic speech recognition and voice interaction models.

2025

Chemistry Data Annotation and LaTeX Scientific Equation Formatting for AI Training (Populii – Google Project)

OtherComputer Code ProgrammingQuestion AnsweringText Generation
This project involved annotating and formatting chemistry data and equations using LaTeX to ensure accurate representation of scientific notation. Tasks included converting raw chemistry questions, reaction equations, and formula expressions into properly structured LaTeX code, ensuring correct use of subscripts, superscripts, reaction arrows, ionic charges, and balanced chemical equations. Quality standards required strict adherence to annotation guidelines, verification of chemical accuracy, and validation of LaTeX syntax to ensure correct rendering. Each entry was reviewed for consistency and formatting integrity, contributing to a clean, structured dataset designed to support machine learning systems in understanding and generating scientific equations.

This project involved annotating and formatting chemistry data and equations using LaTeX to ensure accurate representation of scientific notation. Tasks included converting raw chemistry questions, reaction equations, and formula expressions into properly structured LaTeX code, ensuring correct use of subscripts, superscripts, reaction arrows, ionic charges, and balanced chemical equations. Quality standards required strict adherence to annotation guidelines, verification of chemical accuracy, and validation of LaTeX syntax to ensure correct rendering. Each entry was reviewed for consistency and formatting integrity, contributing to a clean, structured dataset designed to support machine learning systems in understanding and generating scientific equations.

2024 - 2025
Scale AI

Computer Vision Data Annotation and Quality Assurance for AI Training Datasets (Remotasks)

Scale AIImageBounding BoxSegmentation
This project involved annotating large-scale image datasets used to train computer vision models for autonomous systems and machine learning applications. Tasks included drawing bounding boxes around objects such as vehicles, pedestrians, road signs, and other environmental features, as well as performing image segmentation and classification to help improve object detection accuracy. Strict quality control processes were followed, including guideline adherence, precision labeling, and multi-stage review to ensure dataset consistency and accuracy. The work contributed to improving AI models’ ability to interpret real-world visual environments by providing high-quality labeled training data.

This project involved annotating large-scale image datasets used to train computer vision models for autonomous systems and machine learning applications. Tasks included drawing bounding boxes around objects such as vehicles, pedestrians, road signs, and other environmental features, as well as performing image segmentation and classification to help improve object detection accuracy. Strict quality control processes were followed, including guideline adherence, precision labeling, and multi-stage review to ensure dataset consistency and accuracy. The work contributed to improving AI models’ ability to interpret real-world visual environments by providing high-quality labeled training data.

2020 - 2022

Education

T

The Open University of Kenya

Bachelor of Science, Data Science

Bachelor of Science
2024 - 2025
E

Egerton University

Diploma, Computer Science

Diploma
2019 - 2022

Work History

U

Upwork

Freelance Data Analyst

Nairobi
2024 - Present
F

Freelance

Data & Reporting Specialist

Nairobi
2023 - 2023