Juan Manuel Hurtado Isaza - Labeling data for debugging large codebases, worked on SWE-Bench Pro bench.

Key Skills

Software

Mercor

Scale AI

Remotasks

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code Programming

Text

Video

Top Task Types

Computer Programming Coding

Data Collection

Fine Tuning

Prompt Response Writing SFT

Text Generation

Freelancer Overview

I have experience in programming-focused data labeling and AI training tasks, with an emphasis on evaluating and annotating complex LLM outputs. My work includes reviewing model responses to high-difficulty prompts, identifying reasoning and domain-level errors, and validating outputs against strict, non-leaky specifications. I am comfortable working with technical content involving Python, data processing, parsing, and structured datasets such as CSVs and PDF-extracted tables. I have also contributed to the design and refinement of atomic, objective evaluation rubrics used in LLM-graded pipelines. This includes stress-testing models with challenging prompts, detecting overfitting and narrow-evaluation behaviors, and producing audit-ready explanations suitable for expert review. My background in programming and data analysis allows me to contribute reliably to AI training tasks that require precision, consistency, and strong technical judgment.

IntermediateEnglishSpanish

Labeling Experience

flaky busters – GitHub Issue Analysis and Test-Based Evaluation

Scale AIComputer Code ProgrammingRLHFComputer Programming Coding

Worked on flaky busters, an AI training project focused on improving agent reliability using real GitHub issues. The task involved analyzing an issue, distilling it into a clear, non-leaky problem statement that accurately described the underlying bug or behavior. Authored unit tests designed to objectively grade the agent’s solution, ensuring tests reflected the true issue and avoided overfitting to a single implementation. The tests were used to evaluate correctness, robustness, and regression handling in automated code fixes.

2025 - 2025

ballerina_capuccina – GitHub Issue–Based Problem and Rubric Generation

Scale AIComputer Code ProgrammingDiagnosisRLHF

Worked on ballerina_capuccina, an AI training project that used real GitHub issues as source material to generate high-quality problem statements and evaluation rubrics. The task involved abstracting issues into clear, non-leaky problem statements and defining precise rubric criteria to grade model responses objectively. Focused on aligning rubrics with observable outputs, avoiding solution leakage, and ensuring fair evaluation of the model’s reasoning and code changes. The resulting artifacts were used to train and assess programming agents on realistic software debugging and maintenance tasks.

2025 - 2025

hyperion_augmentation – Reasoning and Solution Path Augmentation Project

Scale AIComputer Code ProgrammingText GenerationText Summarization

Worked on hyperion augmentation, an AI training project focused on generating high-quality programming problems for an automated coding agent. The core task was to create clear problem statements, precise requirements, and well-defined interface documentation that the agent could use to implement correct solutions. Emphasized non-leaky specifications, test-aligned requirements, and unambiguous interfaces to ensure fair evaluation of the agent’s reasoning and implementation skills. The generated artifacts were used to train and evaluate the agent’s ability to interpret specifications and produce correct, maintainable code.

2025 - 2025

map_explorer – Programming-Focused Agent Evaluation Project

Scale AIComputer Code ProgrammingQuestion AnsweringRLHF

Worked on map_explorer, an AI training and evaluation project focused on a coding agent that debugs GitHub issues step by step. The work involved reviewing the agent’s reasoning traces, correcting flawed intermediate thoughts, and guiding it toward accurate fixes and final solutions. Annotated where the agent misinterpreted requirements, overfit to tests, or produced incorrect code changes. Produced high-quality corrective feedback and improved step-by-step solution trajectories to strengthen the agent’s debugging reliability, specification compliance, and end-to-end patch quality.

2024 - 2025

Education

T

Talencto Tech

Bootcamp, Data Analysis

Bootcamp

2025 - 2025

U

Universidad Tecnológica de Pereira

Bootcamp, Web Development

Bootcamp

2024 - 2024

Work History

E

Entre Trámites

Full Stack Developer

Pereira

2023 - 2025