For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Ayoub Baitti

Ayoub Baitti

AI Evaluation Lead & Prompt Engineer | LLM Quality & RLHF Specialist

ALGERIA flag
Algiers, Algeria
$35.00/hrExpertLabel StudioLabelboxScale AI

Key Skills

Software

Label StudioLabel Studio
LabelboxLabelbox
Scale AIScale AI
RemotasksRemotasks
ProdigyProdigy
Internal/Proprietary Tooling

Top Subject Matter

Software Engineering / Code Review
LLM Output Evaluation
Mathematics / Reasoning

Top Data Types

TextText
Computer Code ProgrammingComputer Code Programming
DocumentDocument

Top Task Types

Computer Programming Coding
Red Teaming
Fine Tuning
RLHF
Function Calling
Evaluation Rating
Prompt Response Writing SFT
Question Answering
Text Generation
Classification
Entity Ner Classification
Text Summarization

Freelancer Overview

Expert AI Evaluator with hands-on experience in RLHF, SFT, LLM output evaluation, and multi-agent system design. Specialized in evaluation rubric development, prompt optimization, and code review workflows. 125K+ AI-classified interactions processed with 95%+ accuracy.

ExpertArabicFrenchEnglish

Labeling Experience

Freelance AI Consultant

TextClassification
As a Freelance AI Consultant, I performed large-scale ratings and evaluations for AI output improvement. My role involved rating outputs using rubrics, building evaluation frameworks, and managing annotation pipeline quality. I contributed significantly to increasing downstream model performance through my hands-on annotation and evaluation work. • Rated 1,000+ AI outputs for factuality, coherence, and safety • Constructed evaluation and annotation quality pipelines • Improved model quality by 35% via rubric-aligned evaluations • Delivered end-to-end AI automation solutions for clients

As a Freelance AI Consultant, I performed large-scale ratings and evaluations for AI output improvement. My role involved rating outputs using rubrics, building evaluation frameworks, and managing annotation pipeline quality. I contributed significantly to increasing downstream model performance through my hands-on annotation and evaluation work. • Rated 1,000+ AI outputs for factuality, coherence, and safety • Constructed evaluation and annotation quality pipelines • Improved model quality by 35% via rubric-aligned evaluations • Delivered end-to-end AI automation solutions for clients

2023 - Present

AI Quality & Evaluation Specialist — Clark (AI Companion Product)

AudioQuestion Answering
As an AI Quality & Evaluation Specialist at Clark, I created evaluation rubrics for conversational AI and structured output schemas for sample assessments. I evaluated outputs from major LLMs including GPT-4, Claude, and open-source models. This helped achieve consistent output format and reduced off-brand AI responses. • Designed over 15 evaluation rubrics for AI conversational quality • Built structured output schemas for 1,000+ sample evaluations • Analyzed model outputs for tone, brand, factuality, and safety • Improved output consistency and brand alignment across models

As an AI Quality & Evaluation Specialist at Clark, I created evaluation rubrics for conversational AI and structured output schemas for sample assessments. I evaluated outputs from major LLMs including GPT-4, Claude, and open-source models. This helped achieve consistent output format and reduced off-brand AI responses. • Designed over 15 evaluation rubrics for AI conversational quality • Built structured output schemas for 1,000+ sample evaluations • Analyzed model outputs for tone, brand, factuality, and safety • Improved output consistency and brand alignment across models

2024 - Present

AI Evaluation Lead & Prompt Engineer — Beon (AI Automation Agency)

TextQuestion Answering
As the AI Evaluation Lead & Prompt Engineer at Beon, I designed and implemented multidimensional evaluation rubrics to assess LLM outputs. I refined prompts and created lead-qualification pipelines with built-in quality checkpoints. My work ensured high classification accuracy and reduced manual processing time for B2B sales workflows. • Developed five-dimensional evaluation criteria for model deployment • Refined 50+ prompts via blind side-by-side LLM comparisons • Automated and quality-checked 125,000+ AI-classified interactions • Maintained prompt and output quality through continuous rubric evolution

As the AI Evaluation Lead & Prompt Engineer at Beon, I designed and implemented multidimensional evaluation rubrics to assess LLM outputs. I refined prompts and created lead-qualification pipelines with built-in quality checkpoints. My work ensured high classification accuracy and reduced manual processing time for B2B sales workflows. • Developed five-dimensional evaluation criteria for model deployment • Refined 50+ prompts via blind side-by-side LLM comparisons • Automated and quality-checked 125,000+ AI-classified interactions • Maintained prompt and output quality through continuous rubric evolution

2024 - Present

Education

U

University of Science and Technology Houari Boumediene (USTHB)

Bachelor's Degree (Licence), Biological Sciences

Bachelor's Degree (Licence)
2022 - 2025

Work History

B

Beon

AI Evaluation Lead & Prompt Engineer

remote
2024 - Present