For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
F

Flora Jude

Senior AI Content Specialist (STEM/Reasoning), Anthropic (Contract via Scale AI)

USA flag
New York, Usa
$15.00/hrExpertScale AIArgilla

Key Skills

Software

Scale AIScale AI
ArgillaArgilla

Top Subject Matter

Mathematical Reasoning
Stem Domain Expertise
Model Evaluation

Top Data Types

TextText
Computer Code ProgrammingComputer Code Programming
DocumentDocument

Top Task Types

Prompt Response Writing SFT
Red Teaming

Freelancer Overview

Senior AI Content Specialist (STEM/Reasoning), Anthropic (Contract via Scale AI). Brings 5+ years of professional experience across legal operations, contract review, compliance, and structured analysis. Core strengths include Scale AI, Argilla, and OpenTrain AI Expert Network. Education includes Doctor of Philosophy, Stanford University (2023) and Master of Science, Stanford University (2020). AI-training focus includes data types such as Text, Computer Code, and Programming and labeling workflows including Prompt + Response Writing (SFT) and Red Teaming.

ExpertEnglish

Labeling Experience

Scale AI

Senior AI Content Specialist (STEM/Reasoning), Anthropic (Contract via Scale AI)

Scale AITextPrompt Response Writing SFT
Led the "Human-in-the-Loop" data labeling initiative for mathematical and STEM reasoning used in Claude 3.5 Sonnet. Developed and applied rubric scoring systems to evaluate output truthfulness and reduce hallucinations in model reasoning. Oversaw the creation of over 5,000 high-quality SFT trajectories within a high-priority labeling sprint. • Spearheaded mathematical proof verification and technical alignment • Implemented systematic evaluation methodologies for model truthfulness • Scaled collaborative label creation with expert contributors • Ensured high accuracy in complex reasoning tasks

Led the "Human-in-the-Loop" data labeling initiative for mathematical and STEM reasoning used in Claude 3.5 Sonnet. Developed and applied rubric scoring systems to evaluate output truthfulness and reduce hallucinations in model reasoning. Oversaw the creation of over 5,000 high-quality SFT trajectories within a high-priority labeling sprint. • Spearheaded mathematical proof verification and technical alignment • Implemented systematic evaluation methodologies for model truthfulness • Scaled collaborative label creation with expert contributors • Ensured high accuracy in complex reasoning tasks

2024 - Present
Argilla

AI Research Associate, Stanford Artificial Intelligence Laboratory (SAIL)

ArgillaTextRed Teaming
Designed protocols for LLM red teaming and the creation of challenge datasets targeting policy bypass and prompt injection scenarios. Directed data annotation and evaluation for benchmark datasets used across academic and industry collaborations. Successfully built datasets enabling systematic stress-testing and policy evaluation of large language models. • Built partnerships with leading AI research organizations • Defined security criteria for adversarial data labeling • Ensured process integrity for agentic workflow testing • Delivered scalable benchmarking tools for AI safety

Designed protocols for LLM red teaming and the creation of challenge datasets targeting policy bypass and prompt injection scenarios. Directed data annotation and evaluation for benchmark datasets used across academic and industry collaborations. Successfully built datasets enabling systematic stress-testing and policy evaluation of large language models. • Built partnerships with leading AI research organizations • Defined security criteria for adversarial data labeling • Ensured process integrity for agentic workflow testing • Delivered scalable benchmarking tools for AI safety

2019 - 2023

Technical Lead, Process Reward Model (PRM) Optimization for STEM

Prompt Response Writing SFT
Curated and annotated over 10,000 Chain-of-Thought trajectories with expert-verified step-level correctness for STEM benchmarks. Organized large-scale expert annotation campaigns to improve model reasoning on the MATH-500 dataset. Implemented systematic validity checks to ensure high fidelity of training data for process reward modeling. • Improved MATH-500 accuracy by 18% via targeted data labeling • Applied peer review processes for annotation quality control • Coordinated annotation efforts across distributed expert teams • Used LaTeX and PyTorch for technical data encoding

Curated and annotated over 10,000 Chain-of-Thought trajectories with expert-verified step-level correctness for STEM benchmarks. Organized large-scale expert annotation campaigns to improve model reasoning on the MATH-500 dataset. Implemented systematic validity checks to ensure high fidelity of training data for process reward modeling. • Improved MATH-500 accuracy by 18% via targeted data labeling • Applied peer review processes for annotation quality control • Coordinated annotation efforts across distributed expert teams • Used LaTeX and PyTorch for technical data encoding

2021 - 2021

Education

S

Stanford University

Doctor of Philosophy, Computer Science

Doctor of Philosophy
2020 - 2023
S

Stanford University

Master of Science, Computer Science

Master of Science
2018 - 2020

Work History

S

Stanford Artificial Intelligence Laboratory

AI Research Associate

Stanford
2019 - 2023