For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Alexander Walters

Alexander Walters

Highly Experienced Data annotator, LLM Response evaluation. MEng

United Kingdom flagManchester, United Kingdom
$25.00/hrEntry LevelData Annotation TechInternal Proprietary Tooling

Key Skills

Software

Data Annotation TechData Annotation Tech
Internal/Proprietary Tooling

Top Subject Matter

No subject matter listed

Top Data Types

DocumentDocument
ImageImage
TextText

Top Task Types

Audio Recording
Classification
Data Collection
Evaluation Rating
Prompt Response Writing SFT

Freelancer Overview

I’m a data annotator and AI training specialist experienced in RLHF-style evaluation, search grounding and citation QA, multimodal labelling and conversation quality review. I create fine-grained rubrics, write Prompt + Response examplars and deliver consistent, in depth, justified scoring. What sets me apart is a detail oriented, QA-driven approach shaped by an MEng in Civil Engineering and my Graduate ITS role: clear documentation, rigorous checks and consistent decisions. I’m comfortable with analysing large amounts of complex data, quickly absorb new guidelines and turn ambiguous instructions into precise, operational labelling rules.

Entry LevelEnglishSpanish

Labeling Experience

Search Grounding & Citation Quality for LLM Answers

Internal Proprietary ToolingTextClassificationQuestion Answering
Created rubrics for prompts with associated search results. I definined what an ideal answer must include and how citations should be applied. Evaluated evidence sufficiency and drafted reference answers, then compared multiple model responses against the rubric with scored judgments. I wrote justifications (helpfulness, correctness, citation use,). Where permitted, performed light data collection to locate external sources. I also completed peer reviews to improve consistency, flagged ambiguous cases and suggested guideline wording that improved rubrics and responses.

Created rubrics for prompts with associated search results. I definined what an ideal answer must include and how citations should be applied. Evaluated evidence sufficiency and drafted reference answers, then compared multiple model responses against the rubric with scored judgments. I wrote justifications (helpfulness, correctness, citation use,). Where permitted, performed light data collection to locate external sources. I also completed peer reviews to improve consistency, flagged ambiguous cases and suggested guideline wording that improved rubrics and responses.

2025

Image–Text Evaluation and Rubric Design (Multimodal LLM)

Internal Proprietary ToolingImageClassificationQuestion Answering
Evaluated text responses to image-based prompts by first verifying the question is appropriate and relevant. I Created detailed rubrics defining what an ideal model answer should include, then drafted reference responses and rated the current models outputs accordingly. Provided written feedback for each rating and flagged unsuitable or policy-violating content when necessary. Ensured consistency, accuracy and alignment between text prompts and their corresponding visual context.

Evaluated text responses to image-based prompts by first verifying the question is appropriate and relevant. I Created detailed rubrics defining what an ideal model answer should include, then drafted reference responses and rated the current models outputs accordingly. Provided written feedback for each rating and flagged unsuitable or policy-violating content when necessary. Ensured consistency, accuracy and alignment between text prompts and their corresponding visual context.

2025

Conversational LLM Comparison from User Transcripts

Internal Proprietary ToolingTextClassificationQuestion Answering
Imported user-side chat transcripts and extracted prompts and context to compare across two conversational LLMs. Created comparison criteria (helpfulness, faithfulness to context, instruction following, tone/safety) and drafted reference responses for judgement. Scored each model’s answers against the criteria and provided written justifications for all ratings, highlighting strengths, failure modes and opportunities for clarification. Where needed, rephrased prompts to equalise context and avoid bias and documented edge cases.

Imported user-side chat transcripts and extracted prompts and context to compare across two conversational LLMs. Created comparison criteria (helpfulness, faithfulness to context, instruction following, tone/safety) and drafted reference responses for judgement. Scored each model’s answers against the criteria and provided written justifications for all ratings, highlighting strengths, failure modes and opportunities for clarification. Where needed, rephrased prompts to equalise context and avoid bias and documented edge cases.

2025 - 2025

Conversational Audio Prompts & Ideal Response Scripting

Internal Proprietary ToolingAudioClassificationText Generation
Created category-specific audio prompts and matching transcripts, then wrote “ideal” scripted responses for model training. Scenarios included emotion cues (tone/affect embedded in the prompt), clarification requests for interrupted or intentionally vague audio, and controlled lexical modification, volume variance, and background noise to stress-test robustness. Conducted peer review to classify conversation quality (cohesion, instruction following, safety) and provided detailed rationales. Tracked inter-annotator agreement and flagged ambiguous cases, contributing wording updates that improved reviewer alignment and QA.

Created category-specific audio prompts and matching transcripts, then wrote “ideal” scripted responses for model training. Scenarios included emotion cues (tone/affect embedded in the prompt), clarification requests for interrupted or intentionally vague audio, and controlled lexical modification, volume variance, and background noise to stress-test robustness. Conducted peer review to classify conversation quality (cohesion, instruction following, safety) and provided detailed rationales. Tracked inter-annotator agreement and flagged ambiguous cases, contributing wording updates that improved reviewer alignment and QA.

2025 - 2025

Education

N

Newcastle University

MEng (Hons), Civil Engineering

MEng (Hons)
2021 - 2025
U

Ulverston Victoria Sixth Form

A Levels, Mathematics, Physics, Product Design

A Levels
2019 - 2021

Work History

D

Data Annotation Tech

Analyst

Remote
2025 - Present
J

Jacobs

Graduate Intelligent Transport Systems Engineer

Manchester
2024 - Present