For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Believe Abiodun

Believe Abiodun

AI Project Evaluator - LLM Trainer

NIGERIA flag
Lagos, Nigeria
$20.00/hrEntry LevelInternal Proprietary ToolingOther

Key Skills

Software

Internal/Proprietary Tooling
Other

Top Subject Matter

No subject matter listed

Top Data Types

TextText

Top Label Types

RLHF
Fine Tuning
Evaluation Rating
Prompt Response Writing SFT

Freelancer Overview

I am an experienced AI evaluator and data annotation specialist with a strong background in LLM evaluation, complex instruction-following assessment, and rubric-based quality assurance. My work has focused on translating real-world logistics and fulfillment workflows into high-quality test cases, identifying model failure modes such as hallucinations and constraint violations, and delivering actionable insights for model improvement. I have led evaluation pods, designed prompt suites for challenging domains, and conducted calibration sessions to ensure reviewer consistency and reliability. My skill set includes prompt design, error analysis, stakeholder reporting, and hands-on experience with tools like Google Workspace, Excel, Python, SQL, Jira, and LLM evaluation platforms. I am adept at bridging domain expertise in logistics and fulfillment with rigorous evaluation standards to drive high-quality AI training data and annotation outcomes.

Entry LevelEnglishYoruba

Labeling Experience

Computer Use AI Evaluator

OtherTextEvaluation RatingPrompt Response Writing SFT
This project centered on evaluating large language model outputs with a strong emphasis on accuracy, instruction-following, and safety/compliance. The scope involved applying structured rubrics to systematically assess responses, ensuring they met defined standards while identifying areas for improvement. The project size required consistent engagement with complex instruction-following notebooks, where you provided clear and actionable feedback to refine output quality. Specific evaluation tasks included resolving ambiguous edge cases, aligning on scoring standards, and documenting recurring failure patterns that could undermine reliability

This project centered on evaluating large language model outputs with a strong emphasis on accuracy, instruction-following, and safety/compliance. The scope involved applying structured rubrics to systematically assess responses, ensuring they met defined standards while identifying areas for improvement. The project size required consistent engagement with complex instruction-following notebooks, where you provided clear and actionable feedback to refine output quality. Specific evaluation tasks included resolving ambiguous edge cases, aligning on scoring standards, and documenting recurring failure patterns that could undermine reliability

2025

Alibaba Qwen AI Project

Internal Proprietary ToolingTextRLHFFine Tuning
This project involved leading a logistics and fulfillment evaluation initiative focused on testing large language models against real-world workflows such as order flow, shipping logic, exceptions, and returns. The scope covered designing and maintaining evaluation rubrics and prompt suites that mirrored operational processes, ensuring that the models were assessed against realistic constraints. The project was sizable, with a dedicated evaluation pod under your leadership, requiring coordination across multiple reviewers and complex instruction-following cases. Specific data labeling tasks included annotating errors such as hallucinations, missing constraints, and invalid assumptions, as well as reviewing outputs for adherence to logistics rules. Quality measures were central to the effort: calibration sessions aligned reviewers to reduce scoring variance, systematic quality checks enforced consistency, and error analysis summaries provided stakeholders with actionable insights.

This project involved leading a logistics and fulfillment evaluation initiative focused on testing large language models against real-world workflows such as order flow, shipping logic, exceptions, and returns. The scope covered designing and maintaining evaluation rubrics and prompt suites that mirrored operational processes, ensuring that the models were assessed against realistic constraints. The project was sizable, with a dedicated evaluation pod under your leadership, requiring coordination across multiple reviewers and complex instruction-following cases. Specific data labeling tasks included annotating errors such as hallucinations, missing constraints, and invalid assumptions, as well as reviewing outputs for adherence to logistics rules. Quality measures were central to the effort: calibration sessions aligned reviewers to reduce scoring variance, systematic quality checks enforced consistency, and error analysis summaries provided stakeholders with actionable insights.

2025 - 2025

Education

U

University of Benin

Bachelor of Science, Economics and Statistics

Bachelor of Science
2019 - 2019

Work History

S

Shell Nigeria Exploration & Production Company

Business Development Support Specialist

Lagos
2024 - Present
D

Deloitte & Touché

Client Engagement Support

Lagos
2023 - 2024