For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
J

James Mwaura

Principal Data Scientist / Cloud Engineer

Kenya flagNairobi, Kenya
$30.00/hrExpertImeritTolokaOther

Key Skills

Software

iMeritiMerit
TolokaToloka
Other

Top Subject Matter

Large Language Models (LLMs)
Cloud AI Training
Nlp Domain Expertise

Top Data Types

TextText
DocumentDocument

Top Task Types

RLHFRLHF
Entity (NER) ClassificationEntity (NER) Classification

Freelancer Overview

Principal Data Scientist / Cloud Engineer. Brings 14+ years of professional experience across legal operations, contract review, compliance, and structured analysis. Core strengths include Internal, Proprietary Tooling, and iMerit. Education includes Doctor of Philosophy, University of Nairobi (2023) and Master of Science, Strathmore University (2019). AI-training focus includes data types such as Text and labeling workflows including RLHF and Entity (NER) Classification.

ExpertEnglish

Labeling Experience

Principal Data Scientist / Cloud Engineer

TextRLHF
As Principal Data Scientist / Cloud Engineer at Oracle Cloud Infrastructure, I optimized data labeling pipelines and model reliability. My focus was on refining LLMs using RLHF and supporting large-scale infrastructure for AI training. I utilized proprietary Oracle cloud tools to facilitate labeling and evaluation workflows. • Led the development of enterprise-scale AI training pipelines. • Focused on Large Language Model (LLM) refinement and evaluation. • Applied RLHF methodologies for model tuning and safety alignment. • Managed and improved text data annotation using cloud-native solutions.

As Principal Data Scientist / Cloud Engineer at Oracle Cloud Infrastructure, I optimized data labeling pipelines and model reliability. My focus was on refining LLMs using RLHF and supporting large-scale infrastructure for AI training. I utilized proprietary Oracle cloud tools to facilitate labeling and evaluation workflows. • Led the development of enterprise-scale AI training pipelines. • Focused on Large Language Model (LLM) refinement and evaluation. • Applied RLHF methodologies for model tuning and safety alignment. • Managed and improved text data annotation using cloud-native solutions.

2024 - Present

Senior Data Scientist

TextEntity Ner Classification
As Senior Data Scientist at Safaricom PLC, I directed teams in NLP data annotation and Swahili-English corpus refinement. My responsibilities included leading quality assurance for annotated datasets and applying advanced models for localization. The bulk of labeling involved text annotation and classification for mobile financial services and fraud mitigation use cases. • Supervised Swahili-English data labeling projects. • Implemented NLP-focused annotation workflows for financial products. • Improved label quality through detailed dataset validation. • Applied labeled data to predictive models for business optimization.

As Senior Data Scientist at Safaricom PLC, I directed teams in NLP data annotation and Swahili-English corpus refinement. My responsibilities included leading quality assurance for annotated datasets and applying advanced models for localization. The bulk of labeling involved text annotation and classification for mobile financial services and fraud mitigation use cases. • Supervised Swahili-English data labeling projects. • Implemented NLP-focused annotation workflows for financial products. • Improved label quality through detailed dataset validation. • Applied labeled data to predictive models for business optimization.

2019 - 2023
iMerit

Lead AI Trainer & Data Analyst (Contract)

ImeritTextRLHF
During my contract as Lead AI Trainer & Data Analyst with iMerit, Crowdgen, and Toloka, I executed intricate data labeling for autonomous systems and LLM evaluation. My main duties included RLHF for fine-tuning models and STEM-focused data annotation. The work emphasized ensuring extremely high accuracy for educational and technical content. • Performed RLHF and evaluation for language model fine-tuning. • Annotated and curated STEM datasets for AI training. • Ensured accuracy and consistency across complex data labeling tasks. • Used multiple crowd-sourcing and labeling platforms for large-scale projects.

During my contract as Lead AI Trainer & Data Analyst with iMerit, Crowdgen, and Toloka, I executed intricate data labeling for autonomous systems and LLM evaluation. My main duties included RLHF for fine-tuning models and STEM-focused data annotation. The work emphasized ensuring extremely high accuracy for educational and technical content. • Performed RLHF and evaluation for language model fine-tuning. • Annotated and curated STEM datasets for AI training. • Ensured accuracy and consistency across complex data labeling tasks. • Used multiple crowd-sourcing and labeling platforms for large-scale projects.

2016 - 2019

Education

U

University of Nairobi

Doctor of Philosophy, Computer Science

Doctor of Philosophy
2020 - 2023
S

Strathmore University

Master of Science, Computer Science

Master of Science
2017 - 2019

Work History

O

Oracle Cloud Infrastructure

Principal Data Scientist and Cloud Engineer

Nairobi
2024 - Present
S

Safaricom

Senior Data Scientist

Nairobi
2019 - 2023