For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Oscar Alberto Gracia Serrano

Oscar Alberto Gracia Serrano

AI data labeling & LLM evaluation for code quality

Mexico flagMexicali, Mexico
$29.00/hrExpertRemotasksScale AI

Key Skills

Software

RemotasksRemotasks
Scale AIScale AI

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code ProgrammingComputer Code Programming
TextText

Top Task Types

Computer Programming Coding
Evaluation Rating
Red Teaming
RLHF

Freelancer Overview

I have extensive experience in AI training data, specializing in evaluating, annotating, and improving LLM outputs for leading AI companies through my work with Scale AI/Outlier. My focus includes code evaluation, text analysis, model comparison, reasoning assessment, and rubric based quality reviews. I routinely analyze model generated code for accuracy, structure, and reliability, ensuring high quality training data that directly improves the performance of generative AI systems. With a strong background in software engineering and applied data work, I bring a deep technical understanding to labeling tasks involving programming, natural language reasoning, and complex problem solving. I have also built data pipelines, automation systems, and analytics tools, giving me practical expertise in interpreting structured, unstructured, and time series data. This combination of hands on engineering experience and advanced AI evaluation makes me well equipped to produce high quality, detail oriented training data across a range of domains.

ExpertEnglishSpanish

Labeling Experience

Scale AI

LLM Code & Text Evaluation for AI Training

Scale AIComputer Code ProgrammingRLHFEvaluation Rating
I contributed to multiple AI training data projects focused on improving large language models. Tasks included evaluating model generated code for correctness, structure, and reliability; reviewing AI responses for reasoning quality, factual accuracy, and adherence to instructions; and performing A/B model comparisons to determine preferred outputs. I also completed text classification, error identification, explanation rewriting, and prompt response generation (SFT). Quality standards required strict rubric adherence, detailed error labeling, and consistent application of evaluation guidelines. Projects were medium to large scale, involving hundreds to thousands of annotations across coding, reasoning, and natural language tasks. This work directly improved model performance for real world software development and general-purpose AI capabilities.

I contributed to multiple AI training data projects focused on improving large language models. Tasks included evaluating model generated code for correctness, structure, and reliability; reviewing AI responses for reasoning quality, factual accuracy, and adherence to instructions; and performing A/B model comparisons to determine preferred outputs. I also completed text classification, error identification, explanation rewriting, and prompt response generation (SFT). Quality standards required strict rubric adherence, detailed error labeling, and consistent application of evaluation guidelines. Projects were medium to large scale, involving hundreds to thousands of annotations across coding, reasoning, and natural language tasks. This work directly improved model performance for real world software development and general-purpose AI capabilities.

2023 - 2025

Education

S

Stanford University

Specialization, Machine Learning

Specialization
2015 - 2020
U

University of Valley of Mexico

Bachelors, Computer Science

Bachelors
2015 - 2020

Work History

O

Outlier (Powered by Scale AI)

Software Engineer LLM Evaluation

remote
2023 - Present
N

Nova Automation Technologies

Lead Backend Engineer

N/A
2020 - Present