Bella Baxter - LLM evaluation, NER, and data labeling

Key Skills

Software

Data Annotation Tech

Top Subject Matter

No subject matter listed

Top Data Types

Audio

Image

Text

Top Task Types

Audio Recording

Classification

Entity Ner Classification

Evaluation Rating

Fine Tuning

Freelancer Overview

My background with Data Annotation Tech allowed me to gain experience in LLM evaluation, prompt creation, and various types of data labeling such as images, audio, and text. I especially excelled in tasks that involved finding images and writing prompts that would stump models and cause them to produce a failure response in visual perception and reasoning. I have extensive experience in rating two different model responses along various axes and comparing them to each other. I would rate outputs based on instruction following, groundedness considering what is given in the prompt, truthfulness of factual claims, writing quality and verbosity. I also worked on many tasks where I created prompts that required multimodal sources, the model needed to look at a video and conduct research to answer it. Additionally, I would often be given text or an image and had to classify all entities present. After Data Annotation Tech I gained some experience in transcription of legal court hearings and became familiar with annotating audio.

Entry LevelEnglishSpanish

Labeling Experience

Entity classification (text)

Data Annotation TechTextEntity Ner Classification

I would be given a paragraph of text and I would highlight entities based on given classifications. These classifications ranged from foods, to vehicles, to names of people. The categories were provided, and I would sort through the text and classify each term that fell under a category.

2025 - 2025

Entity classification (image)

Data Annotation TechImageEntity Ner Classification

I would be given an image and tasked with classifying all entities within given constraints. For example, all living entities in an image with animals and humans and cars and bicycles, I would classify the animals and humans into categories by species. I have worked on counting tasks as well, and tasks where an LLM produces a classification and I need to revise and correct it as needed based on the image.

2025 - 2025

Output evaluation

Data Annotation TechTextEvaluation RatingPrompt Response Writing SFT

I worked on several projects where I either needed to create a prompt that would produce a failure in at least one of the models tested, or I would be given a prompt and outputs from the models. I would rate the output responses for each model based on instruction following, groundedness, truthfulness of factual claims, writing quality and verbosity. I would need to pinpoint exactly where the failure occurred, and if I did not get a failure I would need to change my prompt until one was produced. I would then compare the responses from each model to one another, and justify why one was preferable over another.

2025 - 2025

Education

U

University of Delaware

None, Pre Veterinary Medicine

None

2022

Work History

W

Winbak Farm

Veterinary Assistant

Chesapeake City, MD

2024 - Present

R

Red Lion Veterinary Hospital

Veterinary Assistant

Red Lion, DE

2023 - 2025