Lucian-Alin Balint - AI Model Evaluation Specialist – Outlier AI

Key Skills

Software

Micro1

Other

Appen

Scale AI

Remotasks

Mindrift

Toloka

Labelbox

Mercor

Surge AI

Telus

Top Subject Matter

AI Model Evaluator | RLHF, Red Teaming, LLM & Multimodal AI Testing

AI Image Generation & Fashion Photography

AI Safety & Red Teaming

Top Data Types

Text

Image

Document

Audio

Top Task Types

Object Detection

RLHF

Bounding Box

Transcription

Red Teaming

Evaluation Rating

Data Collection

Question Answering

Text Generation

Text Summarization

Freelancer Overview

AI Model Evaluation Specialist with hands-on experience evaluating large language model outputs across RLHF human-in-the-loop workflows. My work focuses on side-by-side response ranking, reasoning evaluation, hallucination detection, and instruction adherence analysis to improve model quality and reliability. I have experience working with multimodal AI tasks including text, image, audio, and video evaluation. I also perform adversarial prompt design and red-teaming style testing to identify model weaknesses, safety issues, and reasoning failures. I follow structured rubric-based scoring guidelines and produce detailed rationales explaining evaluation decisions.

IntermediateEnglishRomanian

Labeling Experience

AI Safety & Red Teaming - LLM Evaluation

TextRed Teaming

Conducted red teaming and adversarial prompt testing on large language models to identify safety weaknesses and failure cases. Designed challenging prompts to evaluate instruction following, reasoning reliability, and policy compliance. Tested model responses for hallucinations, unsafe outputs, logical inconsistencies, and prompt injection vulnerabilities. Documented model weaknesses and contributed feedback to improve model robustness and alignment. Designed adversarial prompts targeting reasoning, safety, and instruction-following behavior Evaluated model outputs using structured rubric-based criteria Performed iterative prompt refinement to surface edge-case failures

2025 - Present

AI Model Evaluation Specialist – Outlier AI

OtherAudioTranscription

Evaluated AI-generated audio transcriptions for accuracy and alignment with spoken recordings. Reviewed model outputs to identify transcription errors, missing words, and contextual inconsistencies. Performed quality validation by comparing AI-generated transcripts against original audio inputs and documenting discrepancies. Applied structured evaluation guidelines to ensure consistency, accuracy, and reliable labeling across transcription validation tasks.

2025 - Present

AI Model Evaluation Specialist – Outlier AI

OtherImageBounding Box

Visual verification and bounding box annotation were performed on image datasets to assess AI object detection accuracy against ground-truth references. Tasks involved identifying objects and people, reviewing model-generated detections, and validating whether outputs matched the visual content correctly. Applied structured quality standards to ensure consistent object identification, accurate annotation, and reliable verification results across computer vision workflows. Maintained careful attention to detail while performing image review and quality control tasks.

2025 - Present

AI Product Prototype – Generative AI Image Fashion Imaging

OtherImageEvaluation Rating

Developed and evaluated an AI-assisted virtual mannequin workflow for generating fashion product imagery. Utilized generative AI tools and structured prompt design to control garment appearance, pose, and visual composition. Conducted qualitative evaluation of generated outputs, assessing realism, anatomical accuracy, clothing alignment, and visual consistency. Iteratively refined prompts to improve output quality and explored automation concepts for scalable AI-driven product imaging pipelines.

2025 - Present

AI Model Evaluation Specialist – Outlier AI

Micro1TextRLHF

Structured RLHF evaluation was performed on AI model outputs across text, audio, image, and reasoning tasks. Responses were compared and ranked using rubric-based criteria within human-in-the-loop evaluation workflows. Evaluated outputs for instruction following, factual accuracy, reasoning quality, safety compliance, and tone consistency. Executed structured reviews on math and logical reasoning chains. Conducted red teaming and adversarial prompt generation to surface model weaknesses and safety vulnerabilities.

2025 - Present

Education

U

University of the West of Scotland

Bachelor of Arts with Honours, International Business – Marketing Pathway

Bachelor of Arts with Honours

2022 - 2025

Work History

O

OroVun

Founder

London

2025 - Present

C

Cozyield

Founder & Operator

London

2024 - Present