For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Juan Manuel Hurtado Isaza

Juan Manuel Hurtado Isaza

Labeling data for debugging large codebases, worked on SWE-Bench Pro bench.

Colombia flagPereira, Colombia
$50.00/hrIntermediateMercorScale AIRemotasks

Key Skills

Software

MercorMercor
Scale AIScale AI
RemotasksRemotasks

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code ProgrammingComputer Code Programming
TextText
VideoVideo

Top Task Types

Computer Programming Coding
Data Collection
Fine Tuning
Prompt Response Writing SFT
Text Generation

Freelancer Overview

I have experience in programming-focused data labeling and AI training tasks, with an emphasis on evaluating and annotating complex LLM outputs. My work includes reviewing model responses to high-difficulty prompts, identifying reasoning and domain-level errors, and validating outputs against strict, non-leaky specifications. I am comfortable working with technical content involving Python, data processing, parsing, and structured datasets such as CSVs and PDF-extracted tables. I have also contributed to the design and refinement of atomic, objective evaluation rubrics used in LLM-graded pipelines. This includes stress-testing models with challenging prompts, detecting overfitting and narrow-evaluation behaviors, and producing audit-ready explanations suitable for expert review. My background in programming and data analysis allows me to contribute reliably to AI training tasks that require precision, consistency, and strong technical judgment.

IntermediateEnglishSpanish

Labeling Experience

Scale AI

flaky busters – GitHub Issue Analysis and Test-Based Evaluation

Scale AIComputer Code ProgrammingRLHFComputer Programming Coding
Worked on flaky busters, an AI training project focused on improving agent reliability using real GitHub issues. The task involved analyzing an issue, distilling it into a clear, non-leaky problem statement that accurately described the underlying bug or behavior. Authored unit tests designed to objectively grade the agent’s solution, ensuring tests reflected the true issue and avoided overfitting to a single implementation. The tests were used to evaluate correctness, robustness, and regression handling in automated code fixes.

Worked on flaky busters, an AI training project focused on improving agent reliability using real GitHub issues. The task involved analyzing an issue, distilling it into a clear, non-leaky problem statement that accurately described the underlying bug or behavior. Authored unit tests designed to objectively grade the agent’s solution, ensuring tests reflected the true issue and avoided overfitting to a single implementation. The tests were used to evaluate correctness, robustness, and regression handling in automated code fixes.

2025 - 2025
Scale AI

ballerina_capuccina – GitHub Issue–Based Problem and Rubric Generation

Scale AIComputer Code ProgrammingDiagnosisRLHF
Worked on ballerina_capuccina, an AI training project that used real GitHub issues as source material to generate high-quality problem statements and evaluation rubrics. The task involved abstracting issues into clear, non-leaky problem statements and defining precise rubric criteria to grade model responses objectively. Focused on aligning rubrics with observable outputs, avoiding solution leakage, and ensuring fair evaluation of the model’s reasoning and code changes. The resulting artifacts were used to train and assess programming agents on realistic software debugging and maintenance tasks.

Worked on ballerina_capuccina, an AI training project that used real GitHub issues as source material to generate high-quality problem statements and evaluation rubrics. The task involved abstracting issues into clear, non-leaky problem statements and defining precise rubric criteria to grade model responses objectively. Focused on aligning rubrics with observable outputs, avoiding solution leakage, and ensuring fair evaluation of the model’s reasoning and code changes. The resulting artifacts were used to train and assess programming agents on realistic software debugging and maintenance tasks.

2025 - 2025
Scale AI

hyperion_augmentation – Reasoning and Solution Path Augmentation Project

Scale AIComputer Code ProgrammingText GenerationText Summarization
Worked on hyperion augmentation, an AI training project focused on generating high-quality programming problems for an automated coding agent. The core task was to create clear problem statements, precise requirements, and well-defined interface documentation that the agent could use to implement correct solutions. Emphasized non-leaky specifications, test-aligned requirements, and unambiguous interfaces to ensure fair evaluation of the agent’s reasoning and implementation skills. The generated artifacts were used to train and evaluate the agent’s ability to interpret specifications and produce correct, maintainable code.

Worked on hyperion augmentation, an AI training project focused on generating high-quality programming problems for an automated coding agent. The core task was to create clear problem statements, precise requirements, and well-defined interface documentation that the agent could use to implement correct solutions. Emphasized non-leaky specifications, test-aligned requirements, and unambiguous interfaces to ensure fair evaluation of the agent’s reasoning and implementation skills. The generated artifacts were used to train and evaluate the agent’s ability to interpret specifications and produce correct, maintainable code.

2025 - 2025
Scale AI

map_explorer – Programming-Focused Agent Evaluation Project

Scale AIComputer Code ProgrammingQuestion AnsweringRLHF
Worked on map_explorer, an AI training and evaluation project focused on a coding agent that debugs GitHub issues step by step. The work involved reviewing the agent’s reasoning traces, correcting flawed intermediate thoughts, and guiding it toward accurate fixes and final solutions. Annotated where the agent misinterpreted requirements, overfit to tests, or produced incorrect code changes. Produced high-quality corrective feedback and improved step-by-step solution trajectories to strengthen the agent’s debugging reliability, specification compliance, and end-to-end patch quality.

Worked on map_explorer, an AI training and evaluation project focused on a coding agent that debugs GitHub issues step by step. The work involved reviewing the agent’s reasoning traces, correcting flawed intermediate thoughts, and guiding it toward accurate fixes and final solutions. Annotated where the agent misinterpreted requirements, overfit to tests, or produced incorrect code changes. Produced high-quality corrective feedback and improved step-by-step solution trajectories to strengthen the agent’s debugging reliability, specification compliance, and end-to-end patch quality.

2024 - 2025

Education

T

Talencto Tech

Bootcamp, Data Analysis

Bootcamp
2025 - 2025
U

Universidad Tecnológica de Pereira

Bootcamp, Web Development

Bootcamp
2024 - 2024

Work History

E

Entre Trámites

Full Stack Developer

Pereira
2023 - 2025