Oscar Alberto Gracia Serrano - AI data labeling & LLM evaluation for code quality

Key Skills

Software

Remotasks

Scale AI

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code Programming

Text

Top Task Types

Computer Programming Coding

Evaluation Rating

Red Teaming

RLHF

Freelancer Overview

I have extensive experience in AI training data, specializing in evaluating, annotating, and improving LLM outputs for leading AI companies through my work with Scale AI/Outlier. My focus includes code evaluation, text analysis, model comparison, reasoning assessment, and rubric based quality reviews. I routinely analyze model generated code for accuracy, structure, and reliability, ensuring high quality training data that directly improves the performance of generative AI systems. With a strong background in software engineering and applied data work, I bring a deep technical understanding to labeling tasks involving programming, natural language reasoning, and complex problem solving. I have also built data pipelines, automation systems, and analytics tools, giving me practical expertise in interpreting structured, unstructured, and time series data. This combination of hands on engineering experience and advanced AI evaluation makes me well equipped to produce high quality, detail oriented training data across a range of domains.

ExpertEnglishSpanish

Labeling Experience

LLM Code & Text Evaluation for AI Training

Scale AIComputer Code ProgrammingRLHFEvaluation Rating

I contributed to multiple AI training data projects focused on improving large language models. Tasks included evaluating model generated code for correctness, structure, and reliability; reviewing AI responses for reasoning quality, factual accuracy, and adherence to instructions; and performing A/B model comparisons to determine preferred outputs. I also completed text classification, error identification, explanation rewriting, and prompt response generation (SFT). Quality standards required strict rubric adherence, detailed error labeling, and consistent application of evaluation guidelines. Projects were medium to large scale, involving hundreds to thousands of annotations across coding, reasoning, and natural language tasks. This work directly improved model performance for real world software development and general-purpose AI capabilities.

2023 - 2025

Education

S

Stanford University

Specialization, Machine Learning

Specialization

2015 - 2020

U

University of Valley of Mexico

Bachelors, Computer Science

Bachelors

2015 - 2020

Work History

O

Outlier (Powered by Scale AI)

Software Engineer LLM Evaluation

remote

2023 - Present

N

Nova Automation Technologies

Lead Backend Engineer

N/A

2020 - Present