For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Luis Felipe

Luis Felipe

AI Evaluation & Data Labeling Specialist 9+ Years Experience

Kenya flagLOS ANGELES, Kenya
$60.00/hrExpertCVATDoccanoLabelbox

Key Skills

Software

CVATCVAT
DoccanoDoccano
LabelboxLabelbox
ProdigyProdigy
RemotasksRemotasks
Scale AIScale AI

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code ProgrammingComputer Code Programming
ImageImage
TextText

Top Task Types

Classification
Data Collection
Prompt Response Writing SFT
RLHF

Freelancer Overview

An AI specialist with 9+ years of experience in data labeling, dataset design, and AI model evaluation. I have created high quality datasets for text, image, and multimodal applications using tools like Scale AI, Labelbox, CVAT, and Prodigy. Skilled in human in the loop testing, classification, annotation, entity recognition, and preference ranking, ensuring AI systems are accurate, reliable, and aligned with intended outcomes. I have contributed to applied AI research at DeepMind, Scale AI/OpenAI, and MIT, focusing on benchmarking, fairness auditing, and model interpretability. Experienced in data verification, quality control, and collaborative projects that enhance training data for large language models and multimodal AI systems.

ExpertEnglish

Labeling Experience

Scale AI

Domain-Specific Benchmark Dataset for Scientific AI Reasoning

Scale AIComputer Code ProgrammingEntity Ner ClassificationClassification
Led the design and creation of a high quality, domain specific dataset to benchmark the reasoning, accuracy, and reliability of generative AI models in scientific domains (e.g., computational biology, physics). Scope & Tasks: The project involved defining annotation schemas for complex scientific reasoning, curating source materials from academic papers, and overseeing a rigorous data labeling pipeline. Specific tasks included: Crafting and labeling complex, multi step questions requiring logical and mathematical reasoning. Annotating "chain of thought" reasoning paths with step by step validity checks. Implementing a multi-stage review process where domain experts (PhD-level) validated labels and annotations created by junior researchers and annotators. Designing and applying a detailed rubric for human in the loop evaluation of model outputs, scoring for factual accuracy, logical soundness, and potential hallucinations.

Led the design and creation of a high quality, domain specific dataset to benchmark the reasoning, accuracy, and reliability of generative AI models in scientific domains (e.g., computational biology, physics). Scope & Tasks: The project involved defining annotation schemas for complex scientific reasoning, curating source materials from academic papers, and overseeing a rigorous data labeling pipeline. Specific tasks included: Crafting and labeling complex, multi step questions requiring logical and mathematical reasoning. Annotating "chain of thought" reasoning paths with step by step validity checks. Implementing a multi-stage review process where domain experts (PhD-level) validated labels and annotations created by junior researchers and annotators. Designing and applying a detailed rubric for human in the loop evaluation of model outputs, scoring for factual accuracy, logical soundness, and potential hallucinations.

2023 - 2025

Education

U

University of Texas at Austin

Ph.D., Computer Science, Computer Science

Ph.D., Computer Science
2016 - 2022

Work History

G

Google DeepMind

AI Research Scientist

Los Angeles
2023 - Present
S

Scale AI

Applied AI Researcher (Contract)

San Francisco
2021 - 2023