For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
G

George Glen

Human-in-the-Loop LLM Evaluation & QA

USA flag
N/A, Usa
$20.00/hrExpert

Key Skills

Software

No software listed

Top Subject Matter

LLM evaluation
AI safety
bias mitigation

Top Data Types

TextText

Top Task Types

Red Teaming

Freelancer Overview

Human-in-the-Loop LLM Evaluation & QA. Brings 9+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Internal and Proprietary Tooling. Education includes Bachelor of Science, Massachusetts Institute of Technology (2016) and Master of Science, Stanford University (2018). AI-training focus includes data types such as Text and labeling workflows including Evaluation, Rating, and Red Teaming.

ExpertEnglish

Labeling Experience

Adversarial Data Labeling & Red-Teaming for LLM Safety

TextRed Teaming
Led red-teaming initiatives to identify vulnerabilities in language models through adversarial evaluation. Designed and deployed custom adversarial test sets targeting hallucination and fairness weaknesses. Results directly improved model safety protocols and guided mitigation development. • Structured evaluation of unsafe and biased LLM outputs. • Implemented mitigation experiments improving fairness metrics. • Integrated adversarial results into retraining and feedback loops. • Reported findings to research and product teams.

Led red-teaming initiatives to identify vulnerabilities in language models through adversarial evaluation. Designed and deployed custom adversarial test sets targeting hallucination and fairness weaknesses. Results directly improved model safety protocols and guided mitigation development. • Structured evaluation of unsafe and biased LLM outputs. • Implemented mitigation experiments improving fairness metrics. • Integrated adversarial results into retraining and feedback loops. • Reported findings to research and product teams.

2022 - Present

Human-in-the-Loop LLM Evaluation & QA

Text
Designed and implemented structured human review frameworks for large language models. Led red-teaming, bias assessment, and safety metric integration into evaluation workflows. Integrated human-in-the-loop processes into model benchmarking, covering tasks like summarization, reasoning, and QA. • Developed annotation workflows and quality scoring rubrics for LLMs. • Built structured adversarial test sets to evaluate bias and unsafe outputs. • Evaluated fine-tuned and RLHF-based models with automated and human feedback. • Produced analysis reports influencing model alignment and reliability.

Designed and implemented structured human review frameworks for large language models. Led red-teaming, bias assessment, and safety metric integration into evaluation workflows. Integrated human-in-the-loop processes into model benchmarking, covering tasks like summarization, reasoning, and QA. • Developed annotation workflows and quality scoring rubrics for LLMs. • Built structured adversarial test sets to evaluate bias and unsafe outputs. • Evaluated fine-tuned and RLHF-based models with automated and human feedback. • Produced analysis reports influencing model alignment and reliability.

2021 - Present

Education

I

Imperial College London

N/A, Artificial Intelligence and Machine Learning

N/A
2019 - 2019
U

University of Cambridge

N/A, Artificial Intelligence and Machine Learning

N/A
2019 - 2019

Work History

T

Tech Company

Senior Software Engineer

N/A
2021 - Present
T

Technology Firm

Software Engineer

N/A
2018 - 2020