For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Guilherme Aguiar

Guilherme Aguiar

PMP Project Manager and Mechanical Engineer | Multilingual speaker

Brazil flagSorocaba, Brazil
$25.00/hrIntermediateData Annotation TechLabelboxRemotasks

Key Skills

Software

Data Annotation TechData Annotation Tech
LabelboxLabelbox
RemotasksRemotasks
Scale AIScale AI

Top Subject Matter

No subject matter listed

Top Data Types

AudioAudio
TextText
VideoVideo

Top Task Types

Classification
Evaluation Rating

Freelancer Overview

I have experience working across multiple AI training and data labeling platforms, performing a wide range of annotation tasks with precision and consistency. I am highly attentive to project guidelines and understand that each dataset has its own purpose, workflow, and quality expectations. My ability to quickly adapt to different instructions and maintain high accuracy helps ensure reliable training data for machine learning models. I am detail-oriented, disciplined, and committed to delivering high-quality annotations within the required standards. I am familiar with various labeling formats and evaluation processes, and I consistently focus on aligning my work with the specific goals and requirements of each project.

IntermediateEnglishSpanishPortuguese

Labeling Experience

Scale AI

Chatbot Evaluation and Conversation Quality Rater

Scale AITextPrompt Response Writing SFT
Worked on chatbot evaluation and conversational data annotation across multiple AI training platforms. Responsibilities included rating chatbot responses for clarity, correctness, safety, tone, and helpfulness; identifying hallucinations or policy violations; classifying user intent; and evaluating multi-turn dialogue coherence. Performed side-by-side model comparisons, tested edge cases in customer service scenarios, and created high-quality human responses to improve chatbot training datasets. Ensured strict adherence to project guidelines, maintained consistent quality scores, and contributed feedback for improving conversational behavior and natural language understanding in AI systems. Handled large volumes of dialogue samples in ongoing QA cycles, supporting the fine-tuning and safety alignment of conversational LLMs.

Worked on chatbot evaluation and conversational data annotation across multiple AI training platforms. Responsibilities included rating chatbot responses for clarity, correctness, safety, tone, and helpfulness; identifying hallucinations or policy violations; classifying user intent; and evaluating multi-turn dialogue coherence. Performed side-by-side model comparisons, tested edge cases in customer service scenarios, and created high-quality human responses to improve chatbot training datasets. Ensured strict adherence to project guidelines, maintained consistent quality scores, and contributed feedback for improving conversational behavior and natural language understanding in AI systems. Handled large volumes of dialogue samples in ongoing QA cycles, supporting the fine-tuning and safety alignment of conversational LLMs.

2024 - 2025
Scale AI

LLM Evaluation and Quality Rater

Scale AITextText GenerationEvaluation Rating
Worked as an LLM evaluator and quality rater across multiple AI training platforms. Tasks included evaluating model responses for accuracy, relevance, safety, tone, and guideline compliance. Performed comparative assessments between model generations, rated conversation quality, identified hallucinations, refined prompts, and created high-quality human-written completions for SFT datasets. Activities also included labeling harmful content, disallowed intent detection, multilingual evaluation (EN/ES/PT), and edge-case testing for robustness. Maintained high quality scores and followed detailed annotation guidelines, contributing to the improvement and fine-tuning of large-scale language models. Project sizes ranged from thousands to tens of thousands of samples, with strict QA checks, calibration tests, and ongoing feedback cycles.

Worked as an LLM evaluator and quality rater across multiple AI training platforms. Tasks included evaluating model responses for accuracy, relevance, safety, tone, and guideline compliance. Performed comparative assessments between model generations, rated conversation quality, identified hallucinations, refined prompts, and created high-quality human-written completions for SFT datasets. Activities also included labeling harmful content, disallowed intent detection, multilingual evaluation (EN/ES/PT), and edge-case testing for robustness. Maintained high quality scores and followed detailed annotation guidelines, contributing to the improvement and fine-tuning of large-scale language models. Project sizes ranged from thousands to tens of thousands of samples, with strict QA checks, calibration tests, and ongoing feedback cycles.

2024 - 2025

Education

F

FIA Business School

Postgraduate Studies, Business and Project Management

Postgraduate Studies
2023 - 2024
F

FACENS University

Bachelor of Science, Mechanical Engineering

Bachelor of Science
2017 - 2021

Work History

M

Metso

Project Manager

Sorocaba
2022 - Present
Z

ZF

Processes Engineering Intern

Sorocaba
2019 - 2021