For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
C

Cianna Washington

AI Response Rater & Evaluation Specialist

USA flagMountain View, CA, Usa
$45.00/hrExpertSurge AIScale AILabel Studio

Key Skills

Software

Surge AISurge AI
Scale AIScale AI
Label StudioLabel Studio
AppenAppen

Top Subject Matter

LLM response evaluation
AI safety
policy compliance

Top Data Types

TextText
AudioAudio
DocumentDocument

Top Task Types

Entity (NER) ClassificationEntity (NER) Classification
ClassificationClassification
RLHFRLHF

Freelancer Overview

AI Response Rater & Evaluation Specialist. Brings 6+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Anthropic, Surge AI, and Internal. Education includes Doctor of Philosophy, Carnegie Mellon University (2024) and Master of Science, Georgia Institute of Technology (2017). AI-training focus includes data types such as Text and labeling workflows including Evaluation, Rating, and Entity (NER) Classification.

ExpertEnglish

Labeling Experience

AI Response Rater & Evaluation Specialist

Text
As an AI Response Rater & Evaluation Specialist at Anthropic, I rated LLM-generated responses along accuracy, safety, and helpfulness dimensions using structured rubrics. My work included comparative preference ranking, detailed labeling for safety datasets, and ongoing contribution to annotation guideline refinement. I regularly provided written rationales and participated in calibration to ensure inter-annotator agreement. • Labeled sensitive, adversarial, and policy-compliance content for model safety datasets • Conducted 500–700 pairwise preference ratings per week with >95% audit pass rate • Submitted 80+ feedback reports leading to improved labeling guidelines • Participated in team calibration sessions; maintained Cohen’s Kappa >0.82

As an AI Response Rater & Evaluation Specialist at Anthropic, I rated LLM-generated responses along accuracy, safety, and helpfulness dimensions using structured rubrics. My work included comparative preference ranking, detailed labeling for safety datasets, and ongoing contribution to annotation guideline refinement. I regularly provided written rationales and participated in calibration to ensure inter-annotator agreement. • Labeled sensitive, adversarial, and policy-compliance content for model safety datasets • Conducted 500–700 pairwise preference ratings per week with >95% audit pass rate • Submitted 80+ feedback reports leading to improved labeling guidelines • Participated in team calibration sessions; maintained Cohen’s Kappa >0.82

2023 - Present
Scale AI

Data Labeling & Annotation Specialist

Scale AITextEntity Ner Classification
At Scale AI as a Data Labeling & Annotation Specialist, I performed high-volume text annotation tasks for various NLP pipelines. My responsibilities involved NER, coreference resolution, sentiment classification, intent labeling, and instruction-following evaluation. I also contributed to moderation labeling and mentored new annotators on best practices. • Labeled 10,000+ question-answer pairs for comprehension and factual accuracy • Performed content moderation for hate speech and misinformation categories • Achieved 98.2% accuracy rate over 3 consecutive quarters • Mentored 5 new annotators in guideline interpretation

At Scale AI as a Data Labeling & Annotation Specialist, I performed high-volume text annotation tasks for various NLP pipelines. My responsibilities involved NER, coreference resolution, sentiment classification, intent labeling, and instruction-following evaluation. I also contributed to moderation labeling and mentored new annotators on best practices. • Labeled 10,000+ question-answer pairs for comprehension and factual accuracy • Performed content moderation for hate speech and misinformation categories • Achieved 98.2% accuracy rate over 3 consecutive quarters • Mentored 5 new annotators in guideline interpretation

2022 - 2023

NLP Data Annotator

TextClassification
In my role as an NLP Data Annotator at Microsoft Research, I annotated large volumes of dialogue, summarization, and question-answering datasets for language modeling. The work included labeling for semantic similarity, entailment, paraphrase, and assisting in guideline development. I consistently delivered high annotation throughput and quality assurance. • Labeled semantic similarity, textual entailment, and paraphrase pairs for NLU benchmarks • Maintained >96% accuracy per QA reviews • Provided feedback during pilot annotation rounds to improve final guidelines • Contributed to red-teaming efforts before model release

In my role as an NLP Data Annotator at Microsoft Research, I annotated large volumes of dialogue, summarization, and question-answering datasets for language modeling. The work included labeling for semantic similarity, entailment, paraphrase, and assisting in guideline development. I consistently delivered high annotation throughput and quality assurance. • Labeled semantic similarity, textual entailment, and paraphrase pairs for NLU benchmarks • Maintained >96% accuracy per QA reviews • Provided feedback during pilot annotation rounds to improve final guidelines • Contributed to red-teaming efforts before model release

2019 - 2021

Research Annotator — AI Lab

TextRLHF
As a Research Annotator at Carnegie Mellon University, I collected and labeled human preference data for RLHF research projects. My tasks involved annotating LLM response pairs for preference, factual accuracy, coherence, and fluency. I also contributed to crowdsourcing interface design and annotation instruction writing. • Labeled 50,000+ LLM response pairs for RLHF studies • Evaluated factual accuracy and fluency of generative model outputs • Assisted in designing annotation interfaces and contributor guides • Supported crowdsourcing projects with quality annotation contributions

As a Research Annotator at Carnegie Mellon University, I collected and labeled human preference data for RLHF research projects. My tasks involved annotating LLM response pairs for preference, factual accuracy, coherence, and fluency. I also contributed to crowdsourcing interface design and annotation instruction writing. • Labeled 50,000+ LLM response pairs for RLHF studies • Evaluated factual accuracy and fluency of generative model outputs • Assisted in designing annotation interfaces and contributor guides • Supported crowdsourcing projects with quality annotation contributions

2018 - 2019

Education

C

Carnegie Mellon University

Doctor of Philosophy, Computer Science

Doctor of Philosophy
2018 - 2024
G

Georgia Institute of Technology

Master of Science, Computer Science

Master of Science
2015 - 2017

Work History

G

Google Deepmind

Machine Learning Data Scientist

Mountain View, CA
2020 - 2023
C

Carnegie Mellon University

Graduate Research Assistant

Pittsburgh, PA
2018 - 2020