For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Pablo Alvarez- García

Pablo Alvarez- García

HITL evaluator responsible for assessing, rating, and improving LLM

Bolivia flagSanta Cruz, Bolivia
$25.00/hrEntry LevelRemotasksScale AIToloka

Key Skills

Software

RemotasksRemotasks
Scale AIScale AI
TolokaToloka

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code ProgrammingComputer Code Programming
DocumentDocument
TextText

Top Task Types

Computer Programming Coding
Evaluation Rating
Fine Tuning
Prompt Response Writing SFT
Question Answering

Freelancer Overview

As a bilingual AI training data specialist with hands-on experience in LLM prompt evaluation, I have contributed to projects focused on improving AI assistant performance through structured scoring, multi-turn dialogue analysis, and error tagging. At Outlier, I work on evaluating complex user-assistant interactions in both English and Spanish, identifying issues like hallucinations, tone mismatches, incomplete reasoning, and tool misuse. My role also includes crafting high-quality feedback and critic comments that guide model improvements and enhance natural language alignment. With a strong foundation in education, ICT, and natural language understanding, I bring a unique perspective to annotation and model assessment. My academic training in AI and prompt engineering, combined with experience in curriculum design and user-focused communication, allows me to consistently deliver high-quality, human-like evaluations that help shape the next generation of AI systems.

Entry LevelGermanEnglishSpanish

Labeling Experience

Scale AI

LLM Prompt Evaluation and Multi-turn AI Assistant Training

Scale AITextText GenerationText Summarization
Worked on improving large language model (LLM) performance through structured prompt-response evaluations. Responsibilities included reviewing multi-turn assistant dialogues in English and Spanish, scoring based on clarity, helpfulness, tone, and factual accuracy, and tagging issues like hallucinations, incomplete reasoning, tool misuse, or user intent mismatch. Produced high-quality feedback and critic comments to guide model improvements. Evaluated prompts across various domains, including productivity, education, and general user queries. Maintained high-quality ratings across thousands of evaluations while meeting rigorous deadlines and task quotas.

Worked on improving large language model (LLM) performance through structured prompt-response evaluations. Responsibilities included reviewing multi-turn assistant dialogues in English and Spanish, scoring based on clarity, helpfulness, tone, and factual accuracy, and tagging issues like hallucinations, incomplete reasoning, tool misuse, or user intent mismatch. Produced high-quality feedback and critic comments to guide model improvements. Evaluated prompts across various domains, including productivity, education, and general user queries. Maintained high-quality ratings across thousands of evaluations while meeting rigorous deadlines and task quotas.

2025

Education

E

Escuela Europea de Negocios Europea de Barcelona

Diploma, Coaching and Neuro-Linguistic Programming

Diploma
2024 - 2024
I

IBM

Professional Certificate, Artificial Intelligence Developer

Professional Certificate
2024 - 2024

Work History

O

Outlier.ai

Code Expert QA

Santa Cruz
2025 - Present
C

Cambridge College

ICT-Computer Science Teacher

Santa Cruz
2025 - Present