Posted: Jan 13, 2026

Evaluation Scenario Writer - AI Agent Testing Specialist

OtherTextPay Per Hour

Overview

Dataset

Labeling Details

Hiring

Budget

Client

Project Overview

We’re looking for an analytical scenario writer with strong QA-style thinking and excellent written English. You should be comfortable designing structured evaluation scenarios, defining expected (“gold standard”) agent behavior, and working with structured formats like JSON/YAML. A background in software testing, QA, data analysis, or NLP annotation is strongly preferred. Basic Python and JavaScript experience is required. What you’ll be doing: You’ll design realistic, reusable evaluation scenarios for LLM-based agents that simulate real-world tasks. You’ll define the golden path and acceptable behaviors, annotate task steps and expected outputs, and document edge cases and scoring logic. You’ll also review agent outputs, iterate on scenarios for clarity and coverage, and collaborate with developers and other contributors to test and refine evaluation frameworks.

Estimated Total Earnings: $1,090.91Pay Per HourIntermediate6+ monthsIndependent AI Trainers Only

Estimated Total Earnings

$1,090.91

Pay per Hour

$24.00/hr

Time Requirement

20+ hrs/week

Duration

6+ months

Labelers Needed

Description of dataset

Agent evaluation scenarios and test cases

Software

Other

Hiring Type

Independent AI Trainers Only

Required Location

Global Any Location

Workload / Schedule

Flexible can start immediately

Software

Other

Data Type

Text

Task Types

Evaluation Rating

Subject Matter / Industry

LLM agent testing and evaluation design

Language

English

Job Type

Managed Service by OpenTrain

Activity on this project

Proposals: 1415

Invites sent: 0

Unanswered invites: 0

Share this project

Share link