For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
J

Joao Da Silva

Ai data annotator

Brazil flagVitória, Brazil
$10.00/hrIntermediateMercorMicro1

Key Skills

Software

MercorMercor
Micro1

Top Subject Matter

LLM evaluation
Linguistics Domain Expertise
discourse analysis

Top Data Types

TextText
VideoVideo
DocumentDocument

Top Task Types

Red Teaming
Prompt Response Writing SFT
Question Answering
Text Summarization
RLHF
Fine Tuning
Transcription
Text Generation
Data Collection

Freelancer Overview

LLM Side-by-Side Evaluation/Annotator – Business Analyst (Project Google agent evals, Turing). Brings 14+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Internal, Proprietary Tooling, and Mercor. Education includes Bachelor of Arts, Federal University of Espírito Santo (UFES) (2021) and Bachelor of Arts, UNIFAHE (2021). AI-training focus includes data types such as Text and Video and labeling workflows including Evaluation, Rating, and Red Teaming.

IntermediateEnglishPortuguese

Labeling Experience

Mercor

Prompt Engineer / Golden Response Writer (Mercor)

MercorTextPrompt Response Writing SFT
Designed and validated Golden Prompts with explicit instructions to maximize LLM performance. Produced Golden Responses as reference-quality model outputs, ensuring fidelity to user intent, accuracy, and stylistic compliance. Defined task constraints, formatting expectations, and detailed success criteria for supervised fine-tuning. • Authored high-quality prompt-response pairs for SFT datasets • Applied explicit evaluation criteria to reference answers • Tuned prompts to elicit deterministic, instruction-following behavior • Ensured SFT content adhered to complex guidance and style rules

Designed and validated Golden Prompts with explicit instructions to maximize LLM performance. Produced Golden Responses as reference-quality model outputs, ensuring fidelity to user intent, accuracy, and stylistic compliance. Defined task constraints, formatting expectations, and detailed success criteria for supervised fine-tuning. • Authored high-quality prompt-response pairs for SFT datasets • Applied explicit evaluation criteria to reference answers • Tuned prompts to elicit deterministic, instruction-following behavior • Ensured SFT content adhered to complex guidance and style rules

2025 - Present
Mercor

Multimodal AI Evaluator (Mercor)

MercorVideo
Performed comparative evaluation of AI-generated videos for visual fidelity, narrative integrity, and prompt adherence. Applied multimodal annotation guidelines to ensure scoring quality and reproducibility. Identified critical failure patterns including inconsistencies and hallucinations in generative outputs. • Rated video responses according to strict evaluation rubrics • Maintained high standards for annotation consistency and clarity • Documented error patterns for model improvement feedback • Collaborated in multimodal benchmarking and scenario design

Performed comparative evaluation of AI-generated videos for visual fidelity, narrative integrity, and prompt adherence. Applied multimodal annotation guidelines to ensure scoring quality and reproducibility. Identified critical failure patterns including inconsistencies and hallucinations in generative outputs. • Rated video responses according to strict evaluation rubrics • Maintained high standards for annotation consistency and clarity • Documented error patterns for model improvement feedback • Collaborated in multimodal benchmarking and scenario design

2025 - Present
Mercor

AI Red Teaming Specialist (LLM Safety & Evaluation, Mercor)

MercorTextRed Teaming
Executed adversarial prompt injection, jailbreaking, and multi-turn safety evaluations on LLMs to test robustness and surface vulnerabilities. Designed red teaming scenarios targeting harmful content, misinformation, and policy breach risks. Identified model weaknesses and alignment failures to inform improved safety guardrails. • Conducted structured attack simulations on AI models • Evaluated efficacy of model safety protocols and content filters • Profiled risk types and alignment gaps for mitigation planning • Provided feedback to improve LLM trustworthiness and policy compliance

Executed adversarial prompt injection, jailbreaking, and multi-turn safety evaluations on LLMs to test robustness and surface vulnerabilities. Designed red teaming scenarios targeting harmful content, misinformation, and policy breach risks. Identified model weaknesses and alignment failures to inform improved safety guardrails. • Conducted structured attack simulations on AI models • Evaluated efficacy of model safety protocols and content filters • Profiled risk types and alignment gaps for mitigation planning • Provided feedback to improve LLM trustworthiness and policy compliance

2025 - Present
Mercor

Evaluation Dataset Designer – Notebook LM Evaluation Specialist (Mercor)

MercorText
Designed and curated evaluation datasets consisting of queries, responses, and conversations for NotebookLM model testing. Created user queries grounded in real-world scenarios to evaluate AI models in education, health, and planning. Integrated multimodal document sources to enable comprehensive AI performance assessment. • Built datasets to support Gemini and AI Mode LLM comparisons • Wrote realistic prompts and responses simulating user interactions • Ensured test cases targeted diverse and complex use-cases • Leveraged PDF and web content to test contextually-grounded model behavior

Designed and curated evaluation datasets consisting of queries, responses, and conversations for NotebookLM model testing. Created user queries grounded in real-world scenarios to evaluate AI models in education, health, and planning. Integrated multimodal document sources to enable comprehensive AI performance assessment. • Built datasets to support Gemini and AI Mode LLM comparisons • Wrote realistic prompts and responses simulating user interactions • Ensured test cases targeted diverse and complex use-cases • Leveraged PDF and web content to test contextually-grounded model behavior

2025 - Present

Adversarial Dataset Engineer (Humanities, Turing)

Text
Developed novel adversarial prompts and test sets for benchmarking LLM instruction-following robustness. Defined structured reference specifications for objective evaluation and model-as-a-judge tasks. Authored scenarios to test models under conflicting and unusual prompt constraints. • Created comprehensive specification criteria for correct LLM responses • Designed adversarial evaluation materials targeting model weaknesses • Worked with academic partners to advance instruction-following measurement • Ensured all benchmarks adhered to linguistic and semantic rigor

Developed novel adversarial prompts and test sets for benchmarking LLM instruction-following robustness. Defined structured reference specifications for objective evaluation and model-as-a-judge tasks. Authored scenarios to test models under conflicting and unusual prompt constraints. • Created comprehensive specification criteria for correct LLM responses • Designed adversarial evaluation materials targeting model weaknesses • Worked with academic partners to advance instruction-following measurement • Ensured all benchmarks adhered to linguistic and semantic rigor

2025 - Present

Education

U

UNIFAHE

Bachelor of Arts, Mathematics

Bachelor of Arts
2021 - 2026
F

Federal University of Espírito Santo (UFES)

Bachelor of Arts, Linguistics and Literature

Bachelor of Arts
2017 - 2021

Work History

F

Freelance

Academic and Technical Translator

Vitória
2016 - Present
V

VT Assessoria Linguística

Financial and Administrative Coordinator

Espírito Santo
2023 - 2025