For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
M

Marlo Anthony

Generalist Evaluator Expert (LLM & Multimodal Models)

USA flag
TEXAS, Usa
$30.00/hrExpertRemotasksLabelboxOther

Key Skills

Software

RemotasksRemotasks
LabelboxLabelbox
Other
ProdigyProdigy
CloudFactoryCloudFactory
DataloopDataloop
Data Annotation TechData Annotation Tech
DatatroniqDatatroniq
DatumboxDatumbox
Deep SystemsDeep Systems
Google Cloud Vertex AIGoogle Cloud Vertex AI
Figure EightFigure Eight
OpenCV AI Kit (OAK)OpenCV AI Kit (OAK)
Redbrick AIRedbrick AI
Snorkel AISnorkel AI
V7 LabsV7 Labs
VoTT
AWS SageMakerAWS SageMaker
Anno-MageAnno-Mage
CVATCVAT
PlaymentPlayment

Top Subject Matter

LLM Evaluation
Multimodal AI
Computer Vision

Top Data Types

TextText
ImageImage
AudioAudio
VideoVideo

Top Task Types

Classification
Transcription
Entity Ner Classification
Data Collection
Function Calling
Red Teaming
Fine Tuning
RLHF
Text Summarization
Text Generation
Object Detection
Polyline
Evaluation Rating
Prompt Response Writing SFT
Question Answering
Cuboid
Segmentation
Bounding Box
Polygon
Point Key Point
Computer Programming Coding

Freelancer Overview

Generalist Evaluator Expert (LLM & Multimodal Models). Brings 7+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Remotasks, Labelbox, and Whisper. Education includes Master of Science, Stanford University (2022) and Bachelor of Science, Harvard University (2020). AI-training focus includes data types such as Text, Image, and Audio and labeling workflows including Evaluation, Rating, and Classification.

ExpertEnglish

Labeling Experience

Remotasks

Generalist Evaluator Expert (LLM & Multimodal Models)

RemotasksText
As a Generalist Evaluator Expert, I evaluated LLM and multimodal model responses for accuracy, safety, and alignment. I designed judgment tests and provided rubric-based scoring to support model selection and tuning. My work included hallucination detection and detailed qualitative feedback to refine model behavior. • Conducted structured evaluation of model outputs • Ran side-by-side model comparisons and preference ranking • Detected hallucinations, factuality errors, and risk factors • Created custom prompt instructions and scenario tests

As a Generalist Evaluator Expert, I evaluated LLM and multimodal model responses for accuracy, safety, and alignment. I designed judgment tests and provided rubric-based scoring to support model selection and tuning. My work included hallucination detection and detailed qualitative feedback to refine model behavior. • Conducted structured evaluation of model outputs • Ran side-by-side model comparisons and preference ranking • Detected hallucinations, factuality errors, and risk factors • Created custom prompt instructions and scenario tests

2021 - Present

English Language Audio Model Trainer

AudioTranscription
As an English Language Audio Model Trainer, I trained ASR and TTS models through audio segmentation, diarization, and structured transcription. My tasks included speaker identification, event-level timestamping, and metadata provision for engineering support. I regularly assessed ASR outputs for accuracy and contextual alignment across diverse scenarios. • Performed audio transcription and diarization on large datasets • Tagged noise events and multi-speaker overlaps • Evaluated fluency, timing, and contextual accuracy • Provided audio context and metadata for engineering use

As an English Language Audio Model Trainer, I trained ASR and TTS models through audio segmentation, diarization, and structured transcription. My tasks included speaker identification, event-level timestamping, and metadata provision for engineering support. I regularly assessed ASR outputs for accuracy and contextual alignment across diverse scenarios. • Performed audio transcription and diarization on large datasets • Tagged noise events and multi-speaker overlaps • Evaluated fluency, timing, and contextual accuracy • Provided audio context and metadata for engineering use

2020 - Present
Labelbox

Digital Annotation Expert (Text, Image, Video)

LabelboxImageClassification
As a Digital Annotation Expert, I processed large-scale vision datasets involving object detection, image categorization, and video-to-text alignment. I supported generative vision model training with scene descriptions and multi-layer annotations. My role also included evaluating search, ranking, and retrieval tasks using structured rubrics. • Annotated images and videos for topic, object, and category labels • Evaluated prompt/response integrity and summarization QA • Ensured guideline adherence and high inter-annotator agreement • Produced transformation instructions for multimodal datasets

As a Digital Annotation Expert, I processed large-scale vision datasets involving object detection, image categorization, and video-to-text alignment. I supported generative vision model training with scene descriptions and multi-layer annotations. My role also included evaluating search, ranking, and retrieval tasks using structured rubrics. • Annotated images and videos for topic, object, and category labels • Evaluated prompt/response integrity and summarization QA • Ensured guideline adherence and high inter-annotator agreement • Produced transformation instructions for multimodal datasets

2020 - Present
Appen

AI Data Annotation Specialist

AppenTextEntity Ner Classification
As an AI Data Annotation Specialist, I labeled and evaluated 500,000+ data samples spanning text, audio, and multimodal tasks. I executed high-precision labeling for NLP, sentiment analysis, intent detection, entity resolution, and LLM model evaluations. My work contributed to dataset quality and AI model improvement through detailed guideline follow-through and consistency. • Labeled NER, sentiment, intent, and classification tags • Conducted LLM output evaluations for factuality, safety, and hallucinations • Created English audio datasets via transcription and diarization • Provided side-by-side model comparison and objective scoring

As an AI Data Annotation Specialist, I labeled and evaluated 500,000+ data samples spanning text, audio, and multimodal tasks. I executed high-precision labeling for NLP, sentiment analysis, intent detection, entity resolution, and LLM model evaluations. My work contributed to dataset quality and AI model improvement through detailed guideline follow-through and consistency. • Labeled NER, sentiment, intent, and classification tags • Conducted LLM output evaluations for factuality, safety, and hallucinations • Created English audio datasets via transcription and diarization • Provided side-by-side model comparison and objective scoring

2019 - Present
Prodigy

LLM Safety & Behavior Testing

ProdigyText
I performed LLM Safety & Behavior Testing to evaluate harmful content, misinformation, and ethical issues in model outputs. The process required red-teaming, bias detection, and alignment assessments for LLM model safety. I rigorously documented findings and provided recommendations for model behavior improvements. • Evaluated LLM outputs for self-harm, misinformation, and bias • Conducted ethical alignment and red-teaming tasks • Documented safety issues and proposed mitigation steps • Collaborated on improving LLM safety and behavior

I performed LLM Safety & Behavior Testing to evaluate harmful content, misinformation, and ethical issues in model outputs. The process required red-teaming, bias detection, and alignment assessments for LLM model safety. I rigorously documented findings and provided recommendations for model behavior improvements. • Evaluated LLM outputs for self-harm, misinformation, and bias • Conducted ethical alignment and red-teaming tasks • Documented safety issues and proposed mitigation steps • Collaborated on improving LLM safety and behavior

2021 - 2023

Education

S

Stanford University

Master of Science, Machine Learning and Artificial Intelligence

Master of Science
2020 - 2022
H

Harvard University

Bachelor of Science, Computer Science

Bachelor of Science
2016 - 2020

Work History

I

Image, Video)

Digital Annotation Expert (Text

Location not specified
2020 - Present