Daniel Mburure - Software/Data Associate

Key Skills

Software

Scale AI

Labelbox

Data Annotation Tech

SuperAnnotate

Other

Top Subject Matter

AI Model Evaluation

Cybersecurity Domain Expertise

LLM Safety

Top Data Types

Image

Text

Document

Audio

Computer Code Programming

Top Task Types

Data Collection

Prompt Response Writing SFT

Freelancer Overview

Software/Data Associate. Brings 4+ years of professional experience across complex professional workflows, research, and quality-focused execution.

Entry LevelSwahiliEnglishChinese Mandarin

Labeling Experience

AI Model Evaluator and Data Annotator

Text

Evaluated and red teamed AI models and agents for safety and vulnerabilities, focusing on large language model (LLM) output assessment and prompt evaluation. Built and executed offline, reproducible, and auto-evaluable test cases for robust model evaluation processes. Applied data annotation techniques to classify and rate LLM responses, contributing to the continuous improvement of AI text models. • Labeled and evaluated text-based outputs from AI models. • Applied cybersecurity knowledge to identify prompt injection and LLM risks. • Utilized scripting tools and Jupyter Notebooks for annotation workflow automation. • Documented evaluation findings with precise, reproducible steps.

2023 - Present

AI Data Contributor (Freelance / Remote)

ImageData Collection

As an AI data contributor, I captured and organized real-world image datasets for AI training purposes. I performed data labeling and annotation following strict project instructions to ensure dataset compliance. I delivered completed labeled data through cloud platforms such as Google Drive. • Structured image series creation to represent everyday scenarios • Compliance with detailed project guidelines • Organization and management of large image datasets • Use of cloud-based platforms for secure task submission

2023 - Present

JUCE C++ DSP Dataset Curation & SFT Pipeline - Data Labeling Specialist

Prompt Response Writing SFT

Extracted, curated, and labeled C++ Digital Signal Processing functions to create large-scale, high-quality training datasets for code-focused LLMs. Converted technical source material into structured instruction-response pairs suitable for supervised fine-tuning (SFT) using ChatML formatting. Applied multi-stage quality filtering, deduplication, and prompt engineering to enhance dataset reliability and diversity. • Labeled over 3,000 C++ processBlock, AudioBuffer, and DSP functions for use in LLM fine-tuning. • Implemented multi-step review cycles utilizing LLM-assisted generation for dataset augmentation. • Standardized data into ChatML format for direct LLM compatibility. • Ensured data quality via automated and manual quality control procedures.

2025 - 2025

AI Dataset Curator & Instruction Data Generator

OtherTextPrompt Response Writing SFT

I led the design and implementation of workflows for generating structured instruction-response datasets using large language models. My work included data cleaning, deduplication, and ChatML formatting to create high-quality training data. These datasets supported supervised fine-tuning of AI models in advanced natural language processing tasks. • Curated and processed diverse textual inputs for LLM alignment. • Developed and applied cleaning and deduplication pipelines. • Ensured formatting and output consistency with ChatML standards. • Collaborated closely with machine learning engineers and data scientists.

2025 - 2025

Education

J

Jomo Kenyatta University of Agriculture and Technology

Bachelor of Science, Computer Science

Bachelor of Science

2023

Work History

S

Safaricom

Software/Data Associate

Nairobi

2023 - Present

F

Freelance

AI Data Labeling & Collection Specialist

nairobi

2023 - Present