Beau Prater - AI Model Evaluator | Generative AI & Agent Systems

Key Skills

Software

AWS SageMaker

Appen

Clickworker

CrowdFlower

CrowdSource

CVAT

Data Annotation Tech

Mercor

Micro1

Mindrift

OneForma

Telus

Toloka

Top Subject Matter

Large Language Models

Generative AI

Autonomous Agents

Top Data Types

Text

Video

Image

Top Task Types

Evaluation/Rating

Transcription

Object Detection

Text Generation

Point/Key Point

Segmentation

Prompt + Response Writing (SFT)

Freelancer Overview

AI Model Evaluator | Generative AI & Agent Systems. Brings 2+ years of professional experience across legal operations, contract review, compliance, and structured analysis. Core strengths include Internal and Proprietary Tooling. Education includes Bachelor of Science, University of Texas at Austin (2022). AI-training focus includes data types such as Text and labeling workflows including Evaluation and Rating.

ExpertEnglish

Labeling Experience

AI Model Evaluator | Generative AI & Agent Systems

Text

Evaluated outputs from large language models for reasoning, summarization, coding, and instruction following. Applied standardized rubrics and qualitative scoring frameworks to assess accuracy, coherence, safety, and instruction adherence. Identified and documented model weaknesses including hallucinations and inconsistencies to support model refinement. • Conducted structured feedback documentation and calibration with other evaluators. • Delivered actionable insights that reduced recurring model errors. • Maintained high inter-rater reliability during peer audits. • Participated in ambiguity reviews and edge-case identification for guideline improvement.

2020 - 2023

LLM Benchmarking & Quality Assessment Project

Text

Designed and applied scoring rubrics for evaluating reasoning depth, coherence, and factual consistency of outputs. Conducted comparative output and failure analysis across diverse LLM task categories. Identified hallucination patterns and quality drift to support model benchmarking. • Performed in-depth, structured output review for benchmarking projects. • Analyzed trend data to highlight model failure modes. • Contributed insights for model performance documentation. • Supported guideline development for future assessments.

2020 - 2020

Autonomous Agent Task Evaluation Study

Text

Assessed multi-step agent actions for goal alignment, tool usage logic, and decision-making accuracy. Reviewed reasoning traces and provided structured insights to improve workflow reliability. Evaluated agent system outputs to identify edge-case and failure scenario patterns. • Supported reliability enhancements for autonomous agents. • Delivered structured reporting on agent evaluation. • Identified workflow inconsistencies and instructed on corrections. • Helped refine agent system guidelines for evaluation.

Not specified

AI Data & Quality Analyst

Text

Conducted structured annotation and quality review of AI-generated text responses using rubric-based frameworks. Assessed factual accuracy, coherence, safety compliance, and instruction following for generative models. Compiled trend reports and collaborated to refine evaluation documentation for annotation processes. • Managed structured scoring for multiple output types. • Supported QA processes for annotation consistency. • Facilitated feedback loops via documentation. • Participated in cross-functional evaluation sessions.

Not specified

Education

U

University of Texas at Austin

Bachelor of Science, Computer Science

Bachelor of Science

2018 - 2022

Work History

N

N/A

QA Analyst/Digital Product Evaluator

N/A

2018 - 2019