Christina Bennett - Experienced AI Researcher and Trainer, SFT & RLHF

Key Skills

Software

Appen

Clickworker

Data Annotation Tech

Deep Systems

Labelbox

Mindrift

OneForma

Remotasks

Snorkel AI

SuperAnnotate

Toloka

Telus

Scale AI

Top Subject Matter

No subject matter listed

Top Data Types

Document

Image

Text

Top Task Types

Evaluation Rating

Prompt Response Writing SFT

Question Answering

RLHF

Text Generation

Freelancer Overview

As a highly trained and efficient Artificial Intelligence expert, I am deeply passionate about AI, and I have extensive experience training, researching, curating, labeling, and probing Large Language Models for numerous real world use cases. I've worked with a plethora of advanced systems and frameworks including GPT-4, Claude, Google Gemini, and proprietary LLMs, as well as applying essential techniques such as agent based modeling, reinforcement learning, SFT, red teaming, and adversarial testing. I've utilized these frameworks to improve model safety, reasoning, accuracy, quality, and efficiency. My expertise includes prompt engineering, LaTeX, JSON formatting, PyTorch experimentation, and multi-turn interaction design, which required me to drive measurable improvements in model outputs by aligning system prompts, guardrails, and evaluation harnesses with high-priority deployment requirements. In practice, I have trained and evaluated over 20 large scale models across over 1000 tasks, building instruction datasets, annotation pipelines, and creating challenging domain specific prompts, rubrics, guidelines, and answers paired with references and citations assessed for factuality, validity, quality, and accuracy. I've designed adversarial test suites to expose vulnerabilities such as jailbreaks, prompt injections, and distribution shifts while steering alignment through reward modeling, RLHF/RLAIF, and reward shaping.

ExpertFrenchEnglish

Labeling Experience

Red Team Attack

Data Annotation TechTextQuestion AnsweringText Generation

In this project, I completed hundreds of tasks, all with the intention of improving models' abilities and outputs on the basis of harmlessness. This project contained significant amounts of harmful and adversarial content, required to be analyzed according to detailed guidelines. In addition to utilizing human feedback, labeling, and training, to renounce unsafe outputs and implement gold standard safe responses or capability refusals, I was responsible for utilizing prompt engineering to curate red team attacks, in a purposeful attempt to elicit harmful responses from the model. In the event the model did produce a harmful response after being probed to do so, I would then utilize preference ranking, and in depth human feedback as well as creating an expansive rubric and guidelines highlighting outputs and content the model should neither produce nor engage with.

2024 - 2025

Domain Specific Expertise - STEM

Data Annotation TechTextQuestion AnsweringText Generation

This project required in depth analysis and factual accuracy in curated outputs. While working on this domain of STEM focused expertise projects, I was required to create complex, challenging, realistic, and detailed prompts that would be considered highly difficult and intensive even by an upper level scholar (Master and PhD) students. This project contained a detailed data set specifying sources that were approved (and were not), as well as an overview rubric illustrating the necessary components of excellent responses. As a trainer and researcher, I was responsible for picking a specific subtopic within the STEM domain, creating a highly complex, realistic, and challenging prompt, before implementing preference ranking. I would then use RLHF/RLAIF to illustrate gold standard responses to the model and areas for improvement.

2024 - 2025

Macaroon

Data Annotation TechTextQuestion AnsweringText Generation

This was a highly specialized and detailed project with strict guidelines to ensure in depth and realistic outputs. In this project, I was responsible for prompt engineering, extensive research accompanied by detailed credible sources, and thorough outputs. I was required to create complex prompts that were extremely detailed yet realistic inquiries of countless topics (primarily academic), and provide these responses to a model. This method of probing required prompts that would cause the model to answer incorrectly. The purpose of this project was to create inquiries complex enough that an average model could not handle them, in order to make deep seeking research models purposeful.

2023 - 2025

Achilles

Data Annotation TechImageClassificationEmotion Recognition

In this project, I was required to complete preference ranking as well as data labeling and annotation for numerous use cases. I would analyze a variety of prompts including image and text generation inquiries. Following this, I evaluated the quality of the model outputs analyzing for factual accuracy (if applicable), creativity (if applicable), and how well the response adhered to the users prompt request. Then, I would use a combination of SFT, reward modeling, and RLHF to provide in depth descriptions of room for model improvement. I would analyze for inaccuracies specifically on the basis of hallucinations, and further illustrate idealistic responses to the model based upon the detailed guidelines.

2023 - 2025

Education

S

Stony Brook University

Bachelor of Science, Psychology

Bachelor of Science

2018 - 2023

Work History

A

Asbestos Transportation Co.

Administrative Assistant

Shirley

2022 - 2025

S

Stony Brook University

Research Assistant

Shirley

2021 - 2022