For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Lucian-Alin Balint

Lucian-Alin Balint

AI Model Evaluation Specialist – Outlier AI

United Kingdom flagLondon, United Kingdom
$25.00/hrIntermediateMicro1OtherAppen

Key Skills

Software

Micro1
Other
AppenAppen
Scale AIScale AI
RemotasksRemotasks
MindriftMindrift
TolokaToloka
LabelboxLabelbox
MercorMercor
Surge AISurge AI
TelusTelus

Top Subject Matter

AI Model Evaluator | RLHF, Red Teaming, LLM & Multimodal AI Testing
AI Image Generation & Fashion Photography
AI Safety & Red Teaming

Top Data Types

TextText
ImageImage
DocumentDocument
AudioAudio

Top Task Types

Object Detection
RLHF
Bounding Box
Transcription
Red Teaming
Evaluation Rating
Data Collection
Question Answering
Text Generation
Text Summarization

Freelancer Overview

AI Model Evaluation Specialist with hands-on experience evaluating large language model outputs across RLHF human-in-the-loop workflows. My work focuses on side-by-side response ranking, reasoning evaluation, hallucination detection, and instruction adherence analysis to improve model quality and reliability. I have experience working with multimodal AI tasks including text, image, audio, and video evaluation. I also perform adversarial prompt design and red-teaming style testing to identify model weaknesses, safety issues, and reasoning failures. I follow structured rubric-based scoring guidelines and produce detailed rationales explaining evaluation decisions.

IntermediateEnglishRomanian

Labeling Experience

AI Safety & Red Teaming - LLM Evaluation

TextRed Teaming
Conducted red teaming and adversarial prompt testing on large language models to identify safety weaknesses and failure cases. Designed challenging prompts to evaluate instruction following, reasoning reliability, and policy compliance. Tested model responses for hallucinations, unsafe outputs, logical inconsistencies, and prompt injection vulnerabilities. Documented model weaknesses and contributed feedback to improve model robustness and alignment. Designed adversarial prompts targeting reasoning, safety, and instruction-following behavior Evaluated model outputs using structured rubric-based criteria Performed iterative prompt refinement to surface edge-case failures

Conducted red teaming and adversarial prompt testing on large language models to identify safety weaknesses and failure cases. Designed challenging prompts to evaluate instruction following, reasoning reliability, and policy compliance. Tested model responses for hallucinations, unsafe outputs, logical inconsistencies, and prompt injection vulnerabilities. Documented model weaknesses and contributed feedback to improve model robustness and alignment. Designed adversarial prompts targeting reasoning, safety, and instruction-following behavior Evaluated model outputs using structured rubric-based criteria Performed iterative prompt refinement to surface edge-case failures

2025 - Present

AI Model Evaluation Specialist – Outlier AI

OtherAudioTranscription
Evaluated AI-generated audio transcriptions for accuracy and alignment with spoken recordings. Reviewed model outputs to identify transcription errors, missing words, and contextual inconsistencies. Performed quality validation by comparing AI-generated transcripts against original audio inputs and documenting discrepancies. Applied structured evaluation guidelines to ensure consistency, accuracy, and reliable labeling across transcription validation tasks.

Evaluated AI-generated audio transcriptions for accuracy and alignment with spoken recordings. Reviewed model outputs to identify transcription errors, missing words, and contextual inconsistencies. Performed quality validation by comparing AI-generated transcripts against original audio inputs and documenting discrepancies. Applied structured evaluation guidelines to ensure consistency, accuracy, and reliable labeling across transcription validation tasks.

2025 - Present

AI Model Evaluation Specialist – Outlier AI

OtherImageBounding Box
Visual verification and bounding box annotation were performed on image datasets to assess AI object detection accuracy against ground-truth references. Tasks involved identifying objects and people, reviewing model-generated detections, and validating whether outputs matched the visual content correctly. Applied structured quality standards to ensure consistent object identification, accurate annotation, and reliable verification results across computer vision workflows. Maintained careful attention to detail while performing image review and quality control tasks.

Visual verification and bounding box annotation were performed on image datasets to assess AI object detection accuracy against ground-truth references. Tasks involved identifying objects and people, reviewing model-generated detections, and validating whether outputs matched the visual content correctly. Applied structured quality standards to ensure consistent object identification, accurate annotation, and reliable verification results across computer vision workflows. Maintained careful attention to detail while performing image review and quality control tasks.

2025 - Present

AI Product Prototype – Generative AI Image Fashion Imaging

OtherImageEvaluation Rating
Developed and evaluated an AI-assisted virtual mannequin workflow for generating fashion product imagery. Utilized generative AI tools and structured prompt design to control garment appearance, pose, and visual composition. Conducted qualitative evaluation of generated outputs, assessing realism, anatomical accuracy, clothing alignment, and visual consistency. Iteratively refined prompts to improve output quality and explored automation concepts for scalable AI-driven product imaging pipelines.

Developed and evaluated an AI-assisted virtual mannequin workflow for generating fashion product imagery. Utilized generative AI tools and structured prompt design to control garment appearance, pose, and visual composition. Conducted qualitative evaluation of generated outputs, assessing realism, anatomical accuracy, clothing alignment, and visual consistency. Iteratively refined prompts to improve output quality and explored automation concepts for scalable AI-driven product imaging pipelines.

2025 - Present

AI Model Evaluation Specialist – Outlier AI

Micro1TextRLHF
Structured RLHF evaluation was performed on AI model outputs across text, audio, image, and reasoning tasks. Responses were compared and ranked using rubric-based criteria within human-in-the-loop evaluation workflows. Evaluated outputs for instruction following, factual accuracy, reasoning quality, safety compliance, and tone consistency. Executed structured reviews on math and logical reasoning chains. Conducted red teaming and adversarial prompt generation to surface model weaknesses and safety vulnerabilities.

Structured RLHF evaluation was performed on AI model outputs across text, audio, image, and reasoning tasks. Responses were compared and ranked using rubric-based criteria within human-in-the-loop evaluation workflows. Evaluated outputs for instruction following, factual accuracy, reasoning quality, safety compliance, and tone consistency. Executed structured reviews on math and logical reasoning chains. Conducted red teaming and adversarial prompt generation to surface model weaknesses and safety vulnerabilities.

2025 - Present

Education

U

University of the West of Scotland

Bachelor of Arts with Honours, International Business – Marketing Pathway

Bachelor of Arts with Honours
2022 - 2025

Work History

O

OroVun

Founder

London
2025 - Present
C

Cozyield

Founder & Operator

London
2024 - Present