Victoria Leung - LLM Prompt Quality Evaluation Framework — Dataset Evaluator

Key Skills

Software

Labelbox

Prodigy

Other

Top Subject Matter

LLM prompt evaluation and human feedback alignment

Instruction-following NLP datasets across domains

Customer support chatbot response quality

Top Data Types

Text

Document

Top Task Types

Classification

Freelancer Overview

LLM Prompt Quality Evaluation Framework — Dataset Evaluator. Brings 2+ years of professional experience across legal operations, contract review, compliance, and structured analysis. Core strengths include Airtable, Labelbox, and Prodigy. Education includes Bachelor of Science, University of California, Berkeley (2023). AI-training focus includes data types such as Text and labeling workflows including Evaluation, Rating, and Classification.

ExpertEnglish

Labeling Experience

Freelance AI Data Trainer & Evaluator

OtherText

As a freelance AI Data Trainer and Evaluator, I continuously assess and rank AI-generated outputs for safety, accuracy, and instruction-following quality. My responsibilities include adversarial prompt crafting, error detection, and structured feedback delivery for ongoing model improvement. I maintain annotation quality metrics and manage evaluations across thousands of model outputs. • Performed >5,000 evaluations using preference ranking and severity tagging • Designed adversarial prompts to uncover model weaknesses • Delivered feedback to guide model iteration and reduce error rates • Sustained error rate below 2% across all evaluation cycles

2023 - Present

LLM Prompt Quality Evaluation Framework — Dataset Evaluator

Text

I developed and implemented a five-dimensional rubric to assess over 1,200 prompt-response pairs for large language model evaluation. My work focused on quality scoring, error identification, and annotation standardization to ensure reliable dataset evaluation for RLHF pipelines. I collaborated with a team to reach 91% inter-rater agreement and documented failures to enhance LLM prompt quality assessment. • Designed a 5-dimension rubric spanning accuracy, coherence, helpfulness, safety, and instruction-following • Identified and tagged 14 failure modes, such as hallucination and reasoning gaps • Established annotation guidelines yielding a Cohen’s κ of 0.88 • Supplied ranked dataset for downstream human feedback simulation

2024 - 2024

Customer Support Chatbot Response Improvement Data Trainer

ProdigyText

I analyzed 8,000 customer support chatbot interactions and constructed a classification pipeline to detect and correct failed model responses. My work included labeling and compiling 600 improved response pairs for model retraining using preference feedback and performance metrics. This project contributed to measurable improvements in escalation rates and chatbot reliability. • Built and applied a classification rubric to chatbot output errors • Labeled response pairs for preference-based fine-tuning • Used Prodigy and Hugging Face tools for review and annotation • Simulated model improvement with documented F1 score gains

2023 - 2023

High-Quality NLP Instruction Dataset Annotation Specialist

LabelboxTextClassification

I annotated and reviewed over 3,400 instruction-following text samples using multiple labeling platforms focused on NLP datasets. My responsibilities included taxonomy development, audit cycle implementation, and enhancing annotation consistency to raise dataset accuracy rates. I helped maintain high quality through rigorous benchmarking and error resolution protocols. • Developed a taxonomy with 40+ intent categories to streamline annotation • Conducted multiple audit cycles, increasing accuracy from 83% to 96% • Used Labelbox and Label Studio to accelerate the annotation workflow • Maintained top-tier dataset quality through systematic consistency checks

2023 - 2023

Research Assistant – Cognitive AI Lab, UC Berkeley

OtherTextClassification

At UC Berkeley’s Cognitive AI Lab, I annotated over 2,100 human response transcripts to study ambiguity and inference in human-AI interactions. I assisted in designing annotation pipelines and contributed insights into reducing annotation bias and improving consistency. My efforts were central to research initiatives analyzing cognitive features of human-AI transcript data. • Built a Python pipeline to accelerate transcript review and annotation • Coded nuanced transcript features for ambiguity and inference categorization • Collaborated with research teams on annotation bias mitigation • Enabled 35% faster annotation processing for large-scale studies

2022 - 2023

Education

U

University of California, Berkeley

Bachelor of Science, Cognitive Science (Artificial Intelligence Track)

Bachelor of Science

2019 - 2023

Work History

U

University of California, Berkeley

Research Assistant

Berkeley

2022 - 2023