For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
B
Beau Prater

Beau Prater

AI Model Evaluator | Generative AI & Agent Systems

USA flagBLOOMINGTON, Usa
$30.00/hrExpertAws SagemakerAppenClickworker

Key Skills

Software

AWS SageMakerAWS SageMaker
AppenAppen
ClickworkerClickworker
CrowdFlowerCrowdFlower
CrowdSourceCrowdSource
CVATCVAT
Data Annotation TechData Annotation Tech
MercorMercor
Micro1
MindriftMindrift
OneFormaOneForma
TelusTelus
TolokaToloka

Top Subject Matter

Large Language Models
Generative AI
Autonomous Agents

Top Data Types

TextText
VideoVideo
ImageImage

Top Task Types

Evaluation/RatingEvaluation/Rating
TranscriptionTranscription
Object DetectionObject Detection
Text GenerationText Generation
Point/Key PointPoint/Key Point
SegmentationSegmentation
Prompt + Response Writing (SFT)Prompt + Response Writing (SFT)

Freelancer Overview

AI Model Evaluator | Generative AI & Agent Systems. Brings 2+ years of professional experience across legal operations, contract review, compliance, and structured analysis. Core strengths include Internal and Proprietary Tooling. Education includes Bachelor of Science, University of Texas at Austin (2022). AI-training focus includes data types such as Text and labeling workflows including Evaluation and Rating.

ExpertEnglish

Labeling Experience

AI Model Evaluator | Generative AI & Agent Systems

Text
Evaluated outputs from large language models for reasoning, summarization, coding, and instruction following. Applied standardized rubrics and qualitative scoring frameworks to assess accuracy, coherence, safety, and instruction adherence. Identified and documented model weaknesses including hallucinations and inconsistencies to support model refinement. • Conducted structured feedback documentation and calibration with other evaluators. • Delivered actionable insights that reduced recurring model errors. • Maintained high inter-rater reliability during peer audits. • Participated in ambiguity reviews and edge-case identification for guideline improvement.

Evaluated outputs from large language models for reasoning, summarization, coding, and instruction following. Applied standardized rubrics and qualitative scoring frameworks to assess accuracy, coherence, safety, and instruction adherence. Identified and documented model weaknesses including hallucinations and inconsistencies to support model refinement. • Conducted structured feedback documentation and calibration with other evaluators. • Delivered actionable insights that reduced recurring model errors. • Maintained high inter-rater reliability during peer audits. • Participated in ambiguity reviews and edge-case identification for guideline improvement.

2020 - 2023

LLM Benchmarking & Quality Assessment Project

Text
Designed and applied scoring rubrics for evaluating reasoning depth, coherence, and factual consistency of outputs. Conducted comparative output and failure analysis across diverse LLM task categories. Identified hallucination patterns and quality drift to support model benchmarking. • Performed in-depth, structured output review for benchmarking projects. • Analyzed trend data to highlight model failure modes. • Contributed insights for model performance documentation. • Supported guideline development for future assessments.

Designed and applied scoring rubrics for evaluating reasoning depth, coherence, and factual consistency of outputs. Conducted comparative output and failure analysis across diverse LLM task categories. Identified hallucination patterns and quality drift to support model benchmarking. • Performed in-depth, structured output review for benchmarking projects. • Analyzed trend data to highlight model failure modes. • Contributed insights for model performance documentation. • Supported guideline development for future assessments.

2020 - 2020

Autonomous Agent Task Evaluation Study

Text
Assessed multi-step agent actions for goal alignment, tool usage logic, and decision-making accuracy. Reviewed reasoning traces and provided structured insights to improve workflow reliability. Evaluated agent system outputs to identify edge-case and failure scenario patterns. • Supported reliability enhancements for autonomous agents. • Delivered structured reporting on agent evaluation. • Identified workflow inconsistencies and instructed on corrections. • Helped refine agent system guidelines for evaluation.

Assessed multi-step agent actions for goal alignment, tool usage logic, and decision-making accuracy. Reviewed reasoning traces and provided structured insights to improve workflow reliability. Evaluated agent system outputs to identify edge-case and failure scenario patterns. • Supported reliability enhancements for autonomous agents. • Delivered structured reporting on agent evaluation. • Identified workflow inconsistencies and instructed on corrections. • Helped refine agent system guidelines for evaluation.

Not specified

AI Data & Quality Analyst

Text
Conducted structured annotation and quality review of AI-generated text responses using rubric-based frameworks. Assessed factual accuracy, coherence, safety compliance, and instruction following for generative models. Compiled trend reports and collaborated to refine evaluation documentation for annotation processes. • Managed structured scoring for multiple output types. • Supported QA processes for annotation consistency. • Facilitated feedback loops via documentation. • Participated in cross-functional evaluation sessions.

Conducted structured annotation and quality review of AI-generated text responses using rubric-based frameworks. Assessed factual accuracy, coherence, safety compliance, and instruction following for generative models. Compiled trend reports and collaborated to refine evaluation documentation for annotation processes. • Managed structured scoring for multiple output types. • Supported QA processes for annotation consistency. • Facilitated feedback loops via documentation. • Participated in cross-functional evaluation sessions.

Not specified

Education

U

University of Texas at Austin

Bachelor of Science, Computer Science

Bachelor of Science
2018 - 2022

Work History

N

N/A

QA Analyst/Digital Product Evaluator

N/A
2018 - 2019