Luis Felipe - AI Evaluation & Data Labeling Specialist 9+ Years Experience

Key Skills

Software

CVAT

Doccano

Labelbox

Prodigy

Remotasks

Scale AI

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code Programming

Image

Text

Top Task Types

Classification

Data Collection

Prompt Response Writing SFT

RLHF

Freelancer Overview

An AI specialist with 9+ years of experience in data labeling, dataset design, and AI model evaluation. I have created high quality datasets for text, image, and multimodal applications using tools like Scale AI, Labelbox, CVAT, and Prodigy. Skilled in human in the loop testing, classification, annotation, entity recognition, and preference ranking, ensuring AI systems are accurate, reliable, and aligned with intended outcomes. I have contributed to applied AI research at DeepMind, Scale AI/OpenAI, and MIT, focusing on benchmarking, fairness auditing, and model interpretability. Experienced in data verification, quality control, and collaborative projects that enhance training data for large language models and multimodal AI systems.

ExpertEnglish

Labeling Experience

Domain-Specific Benchmark Dataset for Scientific AI Reasoning

Scale AIComputer Code ProgrammingEntity Ner ClassificationClassification

Led the design and creation of a high quality, domain specific dataset to benchmark the reasoning, accuracy, and reliability of generative AI models in scientific domains (e.g., computational biology, physics). Scope & Tasks: The project involved defining annotation schemas for complex scientific reasoning, curating source materials from academic papers, and overseeing a rigorous data labeling pipeline. Specific tasks included: Crafting and labeling complex, multi step questions requiring logical and mathematical reasoning. Annotating "chain of thought" reasoning paths with step by step validity checks. Implementing a multi-stage review process where domain experts (PhD-level) validated labels and annotations created by junior researchers and annotators. Designing and applying a detailed rubric for human in the loop evaluation of model outputs, scoring for factual accuracy, logical soundness, and potential hallucinations.

2023 - 2025

Education

U

University of Texas at Austin

Ph.D., Computer Science, Computer Science

Ph.D., Computer Science

2016 - 2022

Work History

G

Google DeepMind

AI Research Scientist

Los Angeles

2023 - Present

S

Scale AI

Applied AI Researcher (Contract)

San Francisco

2021 - 2023