For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Gedion Raini

Gedion Raini

AI Data Quality Analyst and Annotator

KENYA flag
Nairobi, Kenya
$15.00/hrExpertLabel StudioCVAT

Key Skills

Software

Label StudioLabel Studio
CVATCVAT

Top Subject Matter

Multilingual NLP
Rlhf Domain Expertise
Sentiment Analysis

Top Data Types

TextText
ImageImage

Top Task Types

Entity Ner Classification
Bounding Box
Segmentation
Question Answering
RLHF

Freelancer Overview

AI Data Quality Analyst – DataLab Africa Ltd.. Core strengths include Label Studio, CVAT, and Sama. Education includes Master of Science, University of Nairobi (2024) and Bachelor of Science, University of Nairobi (2022). AI-training focus includes data types such as Text and Image and labeling workflows including Entity (NER) Classification, Bounding Box, and Segmentation.

ExpertEnglish

Labeling Experience

Label Studio

AI Data Quality Analyst – DataLab Africa Ltd.

Label StudioTextEntity Ner Classification
As an AI Data Quality Analyst, I designed annotation schemas for multilingual NLP datasets including sentiment, named entity recognition, and dialogue act classification. I established quality-control pipelines incorporating inter-annotator agreement metrics and automated preprocessing to enhance workflow efficiency. My efforts included creating and maintaining comprehensive style guides, running calibration sessions, and collaborating on RLHF preference annotation guidelines. • Built ground-truth datasets for Swahili, English, and Kikuyu fine-tuning. • Automated labeling workflows with spaCy and Python tools. • Annotated 15,000 RLHF response pairs using preference rubrics. • Led quality assurance and ontology consistency across 200,000 samples.

As an AI Data Quality Analyst, I designed annotation schemas for multilingual NLP datasets including sentiment, named entity recognition, and dialogue act classification. I established quality-control pipelines incorporating inter-annotator agreement metrics and automated preprocessing to enhance workflow efficiency. My efforts included creating and maintaining comprehensive style guides, running calibration sessions, and collaborating on RLHF preference annotation guidelines. • Built ground-truth datasets for Swahili, English, and Kikuyu fine-tuning. • Automated labeling workflows with spaCy and Python tools. • Annotated 15,000 RLHF response pairs using preference rubrics. • Led quality assurance and ontology consistency across 200,000 samples.

2024 - Present

Creator – RLHF Preference Simulator (Personal research project)

OtherTextRLHF
In a personal RLHF research project, I implemented a reward-model training pipeline using pairwise preference data and experimented with ranking models. I quantified annotation consistency requirements and produced a technical write-up measuring the effect on model quality. This work advanced understanding of annotator agreement thresholds in RLHF modeling. • Built RLHF reward-model pipeline in Python and PyTorch. • Utilized pairwise annotation data in preference simulation. • Applied Bradley-Terry and Plackett-Luce ranking evaluation. • Analyzed and reported on annotation agreement sensitivity.

In a personal RLHF research project, I implemented a reward-model training pipeline using pairwise preference data and experimented with ranking models. I quantified annotation consistency requirements and produced a technical write-up measuring the effect on model quality. This work advanced understanding of annotator agreement thresholds in RLHF modeling. • Built RLHF reward-model pipeline in Python and PyTorch. • Utilized pairwise annotation data in preference simulation. • Applied Bradley-Terry and Plackett-Luce ranking evaluation. • Analyzed and reported on annotation agreement sensitivity.

2024 - 2024

Creator – SwahiliQA Benchmark Dataset (Open-source project)

OtherTextQuestion Answering
For the SwahiliQA open-source project, I created a 10,000 QA pair dataset and developed annotation guidelines to ensure high inter-annotator reliability. I recruited and trained eight crowd annotators, applying Krippendorff’s alpha for quality control. This dataset was released for public benchmarking and adopted by academic labs. • Defined comprehensive annotation guidelines for crowd workers. • Trained and managed annotation teams for dataset creation. • Established Krippendorff’s alpha ≥ 0.75 for label quality. • Released QA dataset on Hugging Face Hub for multilingual NLP research.

For the SwahiliQA open-source project, I created a 10,000 QA pair dataset and developed annotation guidelines to ensure high inter-annotator reliability. I recruited and trained eight crowd annotators, applying Krippendorff’s alpha for quality control. This dataset was released for public benchmarking and adopted by academic labs. • Defined comprehensive annotation guidelines for crowd workers. • Trained and managed annotation teams for dataset creation. • Established Krippendorff’s alpha ≥ 0.75 for label quality. • Released QA dataset on Hugging Face Hub for multilingual NLP research.

2023 - 2024
CVAT

Machine Learning Research Assistant – University of Nairobi – AI Lab

CVATImageBounding Box
As a Machine Learning Research Assistant, I curated a 50,000-image dataset for agricultural pest detection and carried out bounding box and polygon segmentation in CVAT. I implemented scripts for benchmarking annotation quality and participated in fine-tuning BERT models for hate-speech detection tasks. My responsibilities also included preparing Hugging Face datasets and managing experimental annotation pipelines for a published research paper. • Annotated and exported images in COCO format for YOLOv8. • Developed inter-annotator scripts using Cohen’s kappa and Krippendorff's alpha. • Facilitated data curation for low-resource language NLP. • Provided validation for experiment results in academic publication.

As a Machine Learning Research Assistant, I curated a 50,000-image dataset for agricultural pest detection and carried out bounding box and polygon segmentation in CVAT. I implemented scripts for benchmarking annotation quality and participated in fine-tuning BERT models for hate-speech detection tasks. My responsibilities also included preparing Hugging Face datasets and managing experimental annotation pipelines for a published research paper. • Annotated and exported images in COCO format for YOLOv8. • Developed inter-annotator scripts using Cohen’s kappa and Krippendorff's alpha. • Facilitated data curation for low-resource language NLP. • Provided validation for experiment results in academic publication.

2022 - 2023
Sama

Data Annotation Intern – Sama

SamaTextSegmentation
As a Data Annotation Intern at Sama, I executed large-scale text and image annotation projects, including semantic segmentation, relevance rating, and query-document matching. I contributed to the development of internal batch processing utilities that increased throughput and accuracy. I participated in LLM red-teaming exercises, identifying and documenting model edge cases for safety evaluation. • Maintained ≥97% accuracy in daily annotation targets. • Enhanced annotation pipeline efficiency by 25% with Python utilities. • Annotated samples for LLM safety red-teaming projects. • Carried out both segmentation and classification annotation at scale.

As a Data Annotation Intern at Sama, I executed large-scale text and image annotation projects, including semantic segmentation, relevance rating, and query-document matching. I contributed to the development of internal batch processing utilities that increased throughput and accuracy. I participated in LLM red-teaming exercises, identifying and documenting model edge cases for safety evaluation. • Maintained ≥97% accuracy in daily annotation targets. • Enhanced annotation pipeline efficiency by 25% with Python utilities. • Annotated samples for LLM safety red-teaming projects. • Carried out both segmentation and classification annotation at scale.

2021 - 2022

Education

U

University of Nairobi

Master of Science, Artificial Intelligence

Master of Science
2022 - 2024
U

University of Nairobi

Bachelor of Science, Computer Science and Mathematics

Bachelor of Science
2018 - 2022

Work History

P

Privatization Authority

IT Officer 1

Nairobi
2024 - Present