For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Pablo Vazquez

Pablo Vazquez

Chemistry, Applied Mathematics & Data Analysis Specialist in English

Argentina flagBuenos Aires, Argentina
$20.00/hrExpertScale AIRemotasks

Key Skills

Software

Scale AIScale AI
RemotasksRemotasks

Top Subject Matter

No subject matter listed

Top Data Types

AudioAudio
ImageImage
TextText

Top Task Types

Action Recognition
Audio Recording
Evaluation Rating
Prompt Response Writing SFT
Translation Localization

Freelancer Overview

design adversarial math prompts that reliably expose LLM reasoning failures, then produce corrective solutions and rubrics to improve model performance. I construct problems across algebra, calculus, probability, and word problems to trigger specific failure modes—multi-step chain-of-thought breaks, symbol/variable confusion, units and dimensional-analysis slips, edge-case handling, and distractor-sensitive reasoning. Prompts are crafted to be minimally altered “gotchas” (ambiguous phrasing, near-duplicate quantities, nested conditions) that surface systematic weaknesses rather than random mistakes. for each prompt, I (1) specify the intended solution path and invariants, (2) run the model to capture errors, (3) localize the failure to precise steps (e.g., incorrect substitution, misapplied theorem, rounding/units drift), and (4) author a corrected derivation with step annotations and brief rationales. I then convert these into evaluation items with gold answers, error taxonomies, and graded hints—enabling both automated scoring and fine-grained feedback. This loop yields high-quality training data that strengthens mathematical consistency, reduces spurious shortcuts, and improves model reliability on real-world problem variants.

ExpertEnglishSpanishPortuguese

Labeling Experience

Scale AI

Adversarial Math Prompting and Reasoning Correction

Scale AITextPrompt Response Writing SFT
Scope Designed adversarial math prompts to surface LLM failures in algebra, calculus, probability, and word problems. - Delivered gold-standard solutions, rationales, rubrics, and EN/ES localized versions. Tasks Crafted problem variants to trigger errors (ambiguity, multi-step slips, unit/symbol drift). Ran model tests; annotated failure steps; authored corrected derivations and final answers. Tagged items with domain, difficulty, skills, error category, and hint tiers; performed EN↔ES localization. Project Size 250–400 items; 2–3 adversarial variants each (≈600–900 prompts); 1–2 revision cycles per item. Numbers adjustable to your exact counts. Quality Measures Double-annotation and rubric checks (target \(\kappa \ge 0.8\)). Peer-reviewed gold solutions; unit/notation verification. Bilingual QA with glossary-enforced terminology; versioned changes vali

Scope Designed adversarial math prompts to surface LLM failures in algebra, calculus, probability, and word problems. - Delivered gold-standard solutions, rationales, rubrics, and EN/ES localized versions. Tasks Crafted problem variants to trigger errors (ambiguity, multi-step slips, unit/symbol drift). Ran model tests; annotated failure steps; authored corrected derivations and final answers. Tagged items with domain, difficulty, skills, error category, and hint tiers; performed EN↔ES localization. Project Size 250–400 items; 2–3 adversarial variants each (≈600–900 prompts); 1–2 revision cycles per item. Numbers adjustable to your exact counts. Quality Measures Double-annotation and rubric checks (target \(\kappa \ge 0.8\)). Peer-reviewed gold solutions; unit/notation verification. Bilingual QA with glossary-enforced terminology; versioned changes vali

2025 - 2025
Scale AI

Chemistry Adversarial Prompting & Correction

Scale AITextEvaluation RatingPrompt Response Writing SFT
Designed adversarial chemistry prompts to expose LLM failure modes across stoichiometry, reaction equations, thermochemistry, equilibrium, acid–base, kinetics, atomic/molecular structure, and laboratory scenarios. Delivered gold-standard solutions, balanced equations, units, rationales, rubrics, and EN/ES localized versions. Crafted problem variants to trigger errors (unit/conversion slips, limiting-reagent traps, significant-figures drift, incorrect equilibrium setup, mechanism/molecular-structure confusion, unsafe lab steps). Ran model tests; annotated where reasoning failed (e.g., wrong mole ratios, incomplete balancing, ICE-table mistakes, rate-law misinterpretation); authored corrected derivations and final answers. Double-annotation and rubric checks (target κ≥0.8). Peer-reviewed solutions with dimensional analysis, significant-figures compliance, and balancing verification. 250–400 chemistry items; 2–3 adversarial variants each (≈600–900 prompts); 1–2 revision cycles per item.

Designed adversarial chemistry prompts to expose LLM failure modes across stoichiometry, reaction equations, thermochemistry, equilibrium, acid–base, kinetics, atomic/molecular structure, and laboratory scenarios. Delivered gold-standard solutions, balanced equations, units, rationales, rubrics, and EN/ES localized versions. Crafted problem variants to trigger errors (unit/conversion slips, limiting-reagent traps, significant-figures drift, incorrect equilibrium setup, mechanism/molecular-structure confusion, unsafe lab steps). Ran model tests; annotated where reasoning failed (e.g., wrong mole ratios, incomplete balancing, ICE-table mistakes, rate-law misinterpretation); authored corrected derivations and final answers. Double-annotation and rubric checks (target κ≥0.8). Peer-reviewed solutions with dimensional analysis, significant-figures compliance, and balancing verification. 250–400 chemistry items; 2–3 adversarial variants each (≈600–900 prompts); 1–2 revision cycles per item.

2024 - 2025
Scale AI

Language Localization & Correction

Scale AITextClassificationTranslation Localization
Bilingual (EN/ES) localization and quality assurance for LLM outputs across math and chemistry tasks, instructions, and rationales. Focus on clarity, cultural/regional neutrality, and domain-accurate terminology, ensuring consistency in notation, units, and academic register. Reviewed model answers for grammar, style, tone, and technical accuracy; fixed false friends, calques, and regional modisms. Produced aligned EN↔ES pairs with mirrored structure, consistent variable names/symbols, and audience-appropriate phrasing. Tagged each item with language, register, region-neutrality, domain, difficulty, and error categories (terminology, style, grammar, notation). ~250–400 localized items with EN↔ES pairs; 1–2 revision cycles per item; glossary expanded iteratively based on encountered edge cases. Figures can be adjusted to your exact counts. Dual-pass review (source fidelity + target fluency). Inter-annotator agreement checks on sampled items (target κ≥0.8

Bilingual (EN/ES) localization and quality assurance for LLM outputs across math and chemistry tasks, instructions, and rationales. Focus on clarity, cultural/regional neutrality, and domain-accurate terminology, ensuring consistency in notation, units, and academic register. Reviewed model answers for grammar, style, tone, and technical accuracy; fixed false friends, calques, and regional modisms. Produced aligned EN↔ES pairs with mirrored structure, consistent variable names/symbols, and audience-appropriate phrasing. Tagged each item with language, register, region-neutrality, domain, difficulty, and error categories (terminology, style, grammar, notation). ~250–400 localized items with EN↔ES pairs; 1–2 revision cycles per item; glossary expanded iteratively based on encountered edge cases. Figures can be adjusted to your exact counts. Dual-pass review (source fidelity + target fluency). Inter-annotator agreement checks on sampled items (target κ≥0.8

2023 - 2024

Education

U

University / UNSAM

Bachelor’s Degree in Biotechnology, Biotecnology

Bachelor’s Degree in Biotechnology
2021 - 2025
T

Technical school EESTNª2

Associate Degree in Chemical Technology, Chemistry

Associate Degree in Chemical Technology
2011 - 2018

Work History

O

Outlier

AI Quality Assurance & Prompt Engineering

Fordinbridge
2023 - Present
A

ANDIS

IT manager

Buenos Aires
2020 - 2025