For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
D

Daniel Mburure

Software/Data Associate

KENYA flag
Nairobi, Kenya
$15.00/hrEntry LevelScale AILabelboxData Annotation Tech

Key Skills

Software

Scale AIScale AI
LabelboxLabelbox
Data Annotation TechData Annotation Tech
SuperAnnotateSuperAnnotate
Other

Top Subject Matter

AI Model Evaluation
Cybersecurity Domain Expertise
LLM Safety

Top Data Types

ImageImage
TextText
DocumentDocument
AudioAudio
Computer Code ProgrammingComputer Code Programming

Top Task Types

Data Collection
Prompt Response Writing SFT

Freelancer Overview

Software/Data Associate. Brings 4+ years of professional experience across complex professional workflows, research, and quality-focused execution.

Entry LevelSwahiliEnglishChinese Mandarin

Labeling Experience

AI Model Evaluator and Data Annotator

Text
Evaluated and red teamed AI models and agents for safety and vulnerabilities, focusing on large language model (LLM) output assessment and prompt evaluation. Built and executed offline, reproducible, and auto-evaluable test cases for robust model evaluation processes. Applied data annotation techniques to classify and rate LLM responses, contributing to the continuous improvement of AI text models. • Labeled and evaluated text-based outputs from AI models. • Applied cybersecurity knowledge to identify prompt injection and LLM risks. • Utilized scripting tools and Jupyter Notebooks for annotation workflow automation. • Documented evaluation findings with precise, reproducible steps.

Evaluated and red teamed AI models and agents for safety and vulnerabilities, focusing on large language model (LLM) output assessment and prompt evaluation. Built and executed offline, reproducible, and auto-evaluable test cases for robust model evaluation processes. Applied data annotation techniques to classify and rate LLM responses, contributing to the continuous improvement of AI text models. • Labeled and evaluated text-based outputs from AI models. • Applied cybersecurity knowledge to identify prompt injection and LLM risks. • Utilized scripting tools and Jupyter Notebooks for annotation workflow automation. • Documented evaluation findings with precise, reproducible steps.

2023 - Present

AI Data Contributor (Freelance / Remote)

ImageData Collection
As an AI data contributor, I captured and organized real-world image datasets for AI training purposes. I performed data labeling and annotation following strict project instructions to ensure dataset compliance. I delivered completed labeled data through cloud platforms such as Google Drive. • Structured image series creation to represent everyday scenarios • Compliance with detailed project guidelines • Organization and management of large image datasets • Use of cloud-based platforms for secure task submission

As an AI data contributor, I captured and organized real-world image datasets for AI training purposes. I performed data labeling and annotation following strict project instructions to ensure dataset compliance. I delivered completed labeled data through cloud platforms such as Google Drive. • Structured image series creation to represent everyday scenarios • Compliance with detailed project guidelines • Organization and management of large image datasets • Use of cloud-based platforms for secure task submission

2023 - Present

JUCE C++ DSP Dataset Curation & SFT Pipeline - Data Labeling Specialist

Prompt Response Writing SFT
Extracted, curated, and labeled C++ Digital Signal Processing functions to create large-scale, high-quality training datasets for code-focused LLMs. Converted technical source material into structured instruction-response pairs suitable for supervised fine-tuning (SFT) using ChatML formatting. Applied multi-stage quality filtering, deduplication, and prompt engineering to enhance dataset reliability and diversity. • Labeled over 3,000 C++ processBlock, AudioBuffer, and DSP functions for use in LLM fine-tuning. • Implemented multi-step review cycles utilizing LLM-assisted generation for dataset augmentation. • Standardized data into ChatML format for direct LLM compatibility. • Ensured data quality via automated and manual quality control procedures.

Extracted, curated, and labeled C++ Digital Signal Processing functions to create large-scale, high-quality training datasets for code-focused LLMs. Converted technical source material into structured instruction-response pairs suitable for supervised fine-tuning (SFT) using ChatML formatting. Applied multi-stage quality filtering, deduplication, and prompt engineering to enhance dataset reliability and diversity. • Labeled over 3,000 C++ processBlock, AudioBuffer, and DSP functions for use in LLM fine-tuning. • Implemented multi-step review cycles utilizing LLM-assisted generation for dataset augmentation. • Standardized data into ChatML format for direct LLM compatibility. • Ensured data quality via automated and manual quality control procedures.

2025 - 2025

AI Dataset Curator & Instruction Data Generator

OtherTextPrompt Response Writing SFT
I led the design and implementation of workflows for generating structured instruction-response datasets using large language models. My work included data cleaning, deduplication, and ChatML formatting to create high-quality training data. These datasets supported supervised fine-tuning of AI models in advanced natural language processing tasks. • Curated and processed diverse textual inputs for LLM alignment. • Developed and applied cleaning and deduplication pipelines. • Ensured formatting and output consistency with ChatML standards. • Collaborated closely with machine learning engineers and data scientists.

I led the design and implementation of workflows for generating structured instruction-response datasets using large language models. My work included data cleaning, deduplication, and ChatML formatting to create high-quality training data. These datasets supported supervised fine-tuning of AI models in advanced natural language processing tasks. • Curated and processed diverse textual inputs for LLM alignment. • Developed and applied cleaning and deduplication pipelines. • Ensured formatting and output consistency with ChatML standards. • Collaborated closely with machine learning engineers and data scientists.

2025 - 2025

Education

J

Jomo Kenyatta University of Agriculture and Technology

Bachelor of Science, Computer Science

Bachelor of Science
2023

Work History

S

Safaricom

Software/Data Associate

Nairobi
2023 - Present
F

Freelance

AI Data Labeling & Collection Specialist

nairobi
2023 - Present