Andrew Muthui - Data Engineering Intern - Data Infrastructure

Key Skills

Software

Scale AI

Top Subject Matter

No subject matter listed

Top Data Types

Text

Top Label Types

Prompt Response Writing SFT

Freelancer Overview

I am a detail-oriented data professional with hands-on experience in data cleaning, validation, and transformation through my internship and academic projects. I have worked extensively with Python, SQL, pandas, and distributed processing tools like Spark and DuckDB to ensure high-quality, reliable datasets for analytics and machine learning applications. My background includes designing and managing ETL pipelines, orchestrating automated workflows with Airflow, and handling structured and semi-structured data formats such as JSON, CSV, and Parquet. I am passionate about maintaining data accuracy and consistency, and I am eager to apply my skills to data labeling, annotation, and AI training data tasks, where data quality and reproducibility are essential.

IntermediateEnglish

Labeling Experience

Text Correction & Evaluation

Scale AITextPrompt Response Writing SFT

Text Correction & Evaluation projects focus on improving and assessing the quality of written text used to train AI systems like chatbots and writing assistants. The scope involves refining grammar, spelling, punctuation, fluency, and clarity while strictly preserving the original meaning. Labeling tasks typically include correcting sentences, rating text quality (e.g., fluency, naturalness, coherence), identifying error types, and checking whether edits change the meaning. Project sizes range from a few thousand to millions of text segments, with workers handling sentences, paragraphs, or short dialogues. Strong quality controls are used, including gold-standard test questions, inter-annotator agreement checks, accuracy thresholds (often 80–90%+), consistency rules, and time monitoring to ensure reliable, high-quality data.

2024 - 2024

Education

K

Kenyatta University

Bachelor of Science, Computer Science and Information Systems

Bachelor of Science

2021 - 2024

Work History

S

Safaricom

Data Engineering Intern

Nairobi

2024 - Present