For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Bella Baxter

Bella Baxter

LLM evaluation, NER, and data labeling

USA flagWhippany, NJ, Usa
$30.00/hrEntry LevelData Annotation Tech

Key Skills

Software

Data Annotation TechData Annotation Tech

Top Subject Matter

No subject matter listed

Top Data Types

AudioAudio
ImageImage
TextText

Top Task Types

Audio Recording
Classification
Entity Ner Classification
Evaluation Rating
Fine Tuning

Freelancer Overview

My background with Data Annotation Tech allowed me to gain experience in LLM evaluation, prompt creation, and various types of data labeling such as images, audio, and text. I especially excelled in tasks that involved finding images and writing prompts that would stump models and cause them to produce a failure response in visual perception and reasoning. I have extensive experience in rating two different model responses along various axes and comparing them to each other. I would rate outputs based on instruction following, groundedness considering what is given in the prompt, truthfulness of factual claims, writing quality and verbosity. I also worked on many tasks where I created prompts that required multimodal sources, the model needed to look at a video and conduct research to answer it. Additionally, I would often be given text or an image and had to classify all entities present. After Data Annotation Tech I gained some experience in transcription of legal court hearings and became familiar with annotating audio.

Entry LevelEnglishSpanish

Labeling Experience

Data Annotation Tech

Entity classification (text)

Data Annotation TechTextEntity Ner Classification
I would be given a paragraph of text and I would highlight entities based on given classifications. These classifications ranged from foods, to vehicles, to names of people. The categories were provided, and I would sort through the text and classify each term that fell under a category.

I would be given a paragraph of text and I would highlight entities based on given classifications. These classifications ranged from foods, to vehicles, to names of people. The categories were provided, and I would sort through the text and classify each term that fell under a category.

2025 - 2025
Data Annotation Tech

Entity classification (image)

Data Annotation TechImageEntity Ner Classification
I would be given an image and tasked with classifying all entities within given constraints. For example, all living entities in an image with animals and humans and cars and bicycles, I would classify the animals and humans into categories by species. I have worked on counting tasks as well, and tasks where an LLM produces a classification and I need to revise and correct it as needed based on the image.

I would be given an image and tasked with classifying all entities within given constraints. For example, all living entities in an image with animals and humans and cars and bicycles, I would classify the animals and humans into categories by species. I have worked on counting tasks as well, and tasks where an LLM produces a classification and I need to revise and correct it as needed based on the image.

2025 - 2025
Data Annotation Tech

Output evaluation

Data Annotation TechTextEvaluation RatingPrompt Response Writing SFT
I worked on several projects where I either needed to create a prompt that would produce a failure in at least one of the models tested, or I would be given a prompt and outputs from the models. I would rate the output responses for each model based on instruction following, groundedness, truthfulness of factual claims, writing quality and verbosity. I would need to pinpoint exactly where the failure occurred, and if I did not get a failure I would need to change my prompt until one was produced. I would then compare the responses from each model to one another, and justify why one was preferable over another.

I worked on several projects where I either needed to create a prompt that would produce a failure in at least one of the models tested, or I would be given a prompt and outputs from the models. I would rate the output responses for each model based on instruction following, groundedness, truthfulness of factual claims, writing quality and verbosity. I would need to pinpoint exactly where the failure occurred, and if I did not get a failure I would need to change my prompt until one was produced. I would then compare the responses from each model to one another, and justify why one was preferable over another.

2025 - 2025

Education

U

University of Delaware

None, Pre Veterinary Medicine

None
2022

Work History

W

Winbak Farm

Veterinary Assistant

Chesapeake City, MD
2024 - Present
R

Red Lion Veterinary Hospital

Veterinary Assistant

Red Lion, DE
2023 - 2025