For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Charlotte Somerville

Charlotte Somerville

LLM Evaluation & Prompt Engineering Specialist across complex tasks

United Kingdom flagKings Lynn, United Kingdom
$22.00/hrEntry LevelData Annotation TechRemotasksScale AI

Key Skills

Software

Data Annotation TechData Annotation Tech
RemotasksRemotasks
Scale AIScale AI
TelusTelus

Top Subject Matter

No subject matter listed

Top Data Types

ImageImage
TextText
VideoVideo

Top Task Types

Emotion Recognition
Evaluation Rating
Fine Tuning
Prompt Response Writing SFT
Question Answering

Freelancer Overview

I have extensive experience in data labeling and AI training across text, image, and video, with a strong focus on evaluation and prompt engineering for large language models. My work spans factuality assessment, instruction following, long-context reasoning, and nuanced dialogue evaluation. I have contributed to projects involving rubric design, multi-turn conversation testing, behavioural instruction development, and image-inclusive tasks that require cross-modal reasoning. My expertise also includes emotion recognition, fact-checking, and educational/tutoring assistant evaluations, as well as creative writing and narrative generation tasks. This combination of analytical and creative skills allows me to assess both technical precision and natural language fluency. I bring a keen eye for subtle errors, strong writing skills, and the ability to provide constructive feedback that improves model outputs. My versatility across evaluation, prompt & response writing (SFT), fine-tuning support, and question answering makes me well-suited to a wide range of generalist AI training projects.

Entry LevelFrenchHebrewEnglish

Labeling Experience

Data Annotation Tech

Cross-Model Comparative LLM Evaluation

Data Annotation TechTextQuestion AnsweringEvaluation Rating
Compared outputs from multiple frontier LLMs on identical prompts. Rated responses for correctness, clarity, style, safety, and grounding. Highlighted strengths, weaknesses, and qualitative differences between models. Provided detailed feedback to support model benchmarking.

Compared outputs from multiple frontier LLMs on identical prompts. Rated responses for correctness, clarity, style, safety, and grounding. Highlighted strengths, weaknesses, and qualitative differences between models. Provided detailed feedback to support model benchmarking.

2025
Data Annotation Tech

Cross-Domain Evaluation – Creative & Educational Tasks

Data Annotation TechTextQuestion AnsweringEvaluation Rating
Evaluated model outputs across diverse domains including creative writing, educational tutoring, and structured reasoning tasks. Assessed for accuracy, creativity, clarity, and age-appropriate tone. Applied rubrics to measure instruction adherence and engagement.

Evaluated model outputs across diverse domains including creative writing, educational tutoring, and structured reasoning tasks. Assessed for accuracy, creativity, clarity, and age-appropriate tone. Applied rubrics to measure instruction adherence and engagement.

2025
Data Annotation Tech

Multi-Turn Conversation Evaluation – Complex Reasoning

Data Annotation TechTextQuestion AnsweringEmotion Recognition
Evaluated multi-turn conversations for depth of reasoning, factuality, and emotional appropriateness. Rated outputs for instruction-following, tone alignment, and clarity. Provided structured feedback on conversation coherence and engagement quality.

Evaluated multi-turn conversations for depth of reasoning, factuality, and emotional appropriateness. Rated outputs for instruction-following, tone alignment, and clarity. Provided structured feedback on conversation coherence and engagement quality.

2025
Data Annotation Tech

Behavioural Instruction & System Prompt Evaluation

Data Annotation TechTextFine TuningEvaluation Rating
Created and evaluated behavioural instruction sets for system prompts. Tested model adherence to constraints, tone, and safety rules across multi-turn conversations. Compared outputs against expected behaviours to refine instruction effectiveness.

Created and evaluated behavioural instruction sets for system prompts. Tested model adherence to constraints, tone, and safety rules across multi-turn conversations. Compared outputs against expected behaviours to refine instruction effectiveness.

2025
Data Annotation Tech

Factuality Evaluation – LLM Claim Verification

Data Annotation TechTextQuestion AnsweringEvaluation Rating
Conducted detailed factuality assessments of model outputs against reference documents. Identified inaccuracies, graded severity, and provided explanations of truthfulness. Focused on grounding responses in given sources and measuring distance from factual truth.

Conducted detailed factuality assessments of model outputs against reference documents. Identified inaccuracies, graded severity, and provided explanations of truthfulness. Focused on grounding responses in given sources and measuring distance from factual truth.

2025

Education

U

University of the Arts, London

Bachelor of Arts (Honours), Photojournalism & Documentary Photography

Bachelor of Arts (Honours)
2013 - 2016
T

Truro School

A Levels, A Levels in Philosophy & Ethics, Biology and Chemistry

A Levels
2010 - 2012

Work History

D

DataAnnotation.Tech

Data Annotator

Kings Lynn
2025 - Present
C

Costa Coffee

Senior Barista Maestro

Kings Lynn
2022 - 2025