Bob Curtis - Senior AI Evaluation Specialist - Machine Learning Systems

Key Skills

Software

Scale AI

Top Subject Matter

No subject matter listed

Top Data Types

Text

Top Label Types

Classification

RLHF

Evaluation Rating

Red Teaming

Prompt Response Writing SFT

Freelancer Overview

With over 8 years of experience in AI systems evaluation, quality assurance, and model benchmarking, I specialize in designing and executing rigorous data labeling and annotation frameworks for complex AI and machine learning systems. My background includes hands-on work with NLP and large language models at leading tech companies, where I developed structured validation protocols, identified critical edge cases, and established gold standard benchmarks for agent reasoning and decision-making. I am highly skilled in scenario design, logical consistency review, and structured data evaluation using tools like Python, SQL, JSON, and YAML. My work has directly informed improvements in system reliability and interpretability, and I am passionate about ensuring high-quality, precise training data that drives the development of robust AI solutions.

ExpertEnglish

Labeling Experience

Large Language Model (LLM) Response Evaluation & RLHF Annotation

Scale AITextClassificationRLHF

Contributed to large-scale LLM training and evaluation initiatives focused on supervised fine-tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). Annotated and evaluated thousands of prompt-response pairs for quality, safety, reasoning accuracy, factual correctness, and policy compliance. Key responsibilities included: Ranking multiple model outputs based on coherence, helpfulness, and alignment with guidelines Labeling reasoning errors, hallucinations, and logical inconsistencies Performing intent classification and sentiment tagging Conducting red teaming to identify safety vulnerabilities and edge cases Writing high-quality prompt-response examples for supervised fine-tuning Validating structured outputs (JSON/YAML) for schema adherence and completeness Worked with detailed annotation rubrics to maintain high inter-annotator agreement (IAA) and consistency. Participated in calibration sessions and secondary review workflows to ensure labe

2021 - 2025

Education

S

Stanford University

Doctor of Philosophy, Computer Science, Machine Learning Evaluation

Doctor of Philosophy

2012 - 2016

U

University of California, Berkeley

Master of Science, Data Science

Master of Science

2012 - 2013

Work History

G

Google

Senior AI Evaluation Specialist

California

2021 - Present

T

Tesla

Machine Learning Systems Analyst and QA Lead

Palo Alto

2018 - 2021