Marlo Anthony - Generalist Evaluator Expert (LLM & Multimodal Models)

Key Skills

Software

Remotasks

Labelbox

Other

Prodigy

CloudFactory

Dataloop

Data Annotation Tech

Datatroniq

Datumbox

Deep Systems

Google Cloud Vertex AI

Figure Eight

OpenCV AI Kit (OAK)

Redbrick AI

Snorkel AI

V7 Labs

VoTT

AWS SageMaker

Anno-Mage

CVAT

Playment

Top Subject Matter

LLM Evaluation

Multimodal AI

Computer Vision

Top Data Types

Text

Image

Audio

Video

Top Task Types

Classification

Transcription

Entity Ner Classification

Data Collection

Function Calling

Red Teaming

Fine Tuning

RLHF

Text Summarization

Text Generation

Object Detection

Polyline

Evaluation Rating

Prompt Response Writing SFT

Question Answering

Cuboid

Segmentation

Bounding Box

Polygon

Point Key Point

Computer Programming Coding

Freelancer Overview

Generalist Evaluator Expert (LLM & Multimodal Models). Brings 7+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Remotasks, Labelbox, and Whisper. Education includes Master of Science, Stanford University (2022) and Bachelor of Science, Harvard University (2020). AI-training focus includes data types such as Text, Image, and Audio and labeling workflows including Evaluation, Rating, and Classification.

ExpertEnglish

Labeling Experience

Generalist Evaluator Expert (LLM & Multimodal Models)

RemotasksText

As a Generalist Evaluator Expert, I evaluated LLM and multimodal model responses for accuracy, safety, and alignment. I designed judgment tests and provided rubric-based scoring to support model selection and tuning. My work included hallucination detection and detailed qualitative feedback to refine model behavior. • Conducted structured evaluation of model outputs • Ran side-by-side model comparisons and preference ranking • Detected hallucinations, factuality errors, and risk factors • Created custom prompt instructions and scenario tests

2021 - Present

English Language Audio Model Trainer

AudioTranscription

As an English Language Audio Model Trainer, I trained ASR and TTS models through audio segmentation, diarization, and structured transcription. My tasks included speaker identification, event-level timestamping, and metadata provision for engineering support. I regularly assessed ASR outputs for accuracy and contextual alignment across diverse scenarios. • Performed audio transcription and diarization on large datasets • Tagged noise events and multi-speaker overlaps • Evaluated fluency, timing, and contextual accuracy • Provided audio context and metadata for engineering use

2020 - Present

Digital Annotation Expert (Text, Image, Video)

LabelboxImageClassification

As a Digital Annotation Expert, I processed large-scale vision datasets involving object detection, image categorization, and video-to-text alignment. I supported generative vision model training with scene descriptions and multi-layer annotations. My role also included evaluating search, ranking, and retrieval tasks using structured rubrics. • Annotated images and videos for topic, object, and category labels • Evaluated prompt/response integrity and summarization QA • Ensured guideline adherence and high inter-annotator agreement • Produced transformation instructions for multimodal datasets

2020 - Present

AI Data Annotation Specialist

AppenTextEntity Ner Classification

As an AI Data Annotation Specialist, I labeled and evaluated 500,000+ data samples spanning text, audio, and multimodal tasks. I executed high-precision labeling for NLP, sentiment analysis, intent detection, entity resolution, and LLM model evaluations. My work contributed to dataset quality and AI model improvement through detailed guideline follow-through and consistency. • Labeled NER, sentiment, intent, and classification tags • Conducted LLM output evaluations for factuality, safety, and hallucinations • Created English audio datasets via transcription and diarization • Provided side-by-side model comparison and objective scoring

2019 - Present

LLM Safety & Behavior Testing

ProdigyText

I performed LLM Safety & Behavior Testing to evaluate harmful content, misinformation, and ethical issues in model outputs. The process required red-teaming, bias detection, and alignment assessments for LLM model safety. I rigorously documented findings and provided recommendations for model behavior improvements. • Evaluated LLM outputs for self-harm, misinformation, and bias • Conducted ethical alignment and red-teaming tasks • Documented safety issues and proposed mitigation steps • Collaborated on improving LLM safety and behavior

2021 - 2023

Education

S

Stanford University

Master of Science, Machine Learning and Artificial Intelligence

Master of Science

2020 - 2022

H

Harvard University

Bachelor of Science, Computer Science

Bachelor of Science

2016 - 2020

Work History

I

Image, Video)

Digital Annotation Expert (Text

Location not specified

2020 - Present