Ayoub Baitti - AI Evaluation Lead & Prompt Engineer | LLM Quality & RLHF Specialist

Key Skills

Software

Label Studio

Labelbox

Scale AI

Remotasks

Prodigy

Internal/Proprietary Tooling

Top Subject Matter

Software Engineering / Code Review

LLM Output Evaluation

Mathematics / Reasoning

Top Data Types

Text

Computer Code Programming

Document

Top Task Types

Computer Programming Coding

Red Teaming

Fine Tuning

RLHF

Function Calling

Evaluation Rating

Prompt Response Writing SFT

Question Answering

Text Generation

Classification

Entity Ner Classification

Text Summarization

Freelancer Overview

Expert AI Evaluator with hands-on experience in RLHF, SFT, LLM output evaluation, and multi-agent system design. Specialized in evaluation rubric development, prompt optimization, and code review workflows. 125K+ AI-classified interactions processed with 95%+ accuracy.

ExpertArabicFrenchEnglish

Labeling Experience

Freelance AI Consultant

TextClassification

As a Freelance AI Consultant, I performed large-scale ratings and evaluations for AI output improvement. My role involved rating outputs using rubrics, building evaluation frameworks, and managing annotation pipeline quality. I contributed significantly to increasing downstream model performance through my hands-on annotation and evaluation work. • Rated 1,000+ AI outputs for factuality, coherence, and safety • Constructed evaluation and annotation quality pipelines • Improved model quality by 35% via rubric-aligned evaluations • Delivered end-to-end AI automation solutions for clients

2023 - Present

AI Quality & Evaluation Specialist — Clark (AI Companion Product)

AudioQuestion Answering

As an AI Quality & Evaluation Specialist at Clark, I created evaluation rubrics for conversational AI and structured output schemas for sample assessments. I evaluated outputs from major LLMs including GPT-4, Claude, and open-source models. This helped achieve consistent output format and reduced off-brand AI responses. • Designed over 15 evaluation rubrics for AI conversational quality • Built structured output schemas for 1,000+ sample evaluations • Analyzed model outputs for tone, brand, factuality, and safety • Improved output consistency and brand alignment across models

2024 - Present

AI Evaluation Lead & Prompt Engineer — Beon (AI Automation Agency)

TextQuestion Answering

As the AI Evaluation Lead & Prompt Engineer at Beon, I designed and implemented multidimensional evaluation rubrics to assess LLM outputs. I refined prompts and created lead-qualification pipelines with built-in quality checkpoints. My work ensured high classification accuracy and reduced manual processing time for B2B sales workflows. • Developed five-dimensional evaluation criteria for model deployment • Refined 50+ prompts via blind side-by-side LLM comparisons • Automated and quality-checked 125,000+ AI-classified interactions • Maintained prompt and output quality through continuous rubric evolution

2024 - Present

Education

U

University of Science and Technology Houari Boumediene (USTHB)

Bachelor's Degree (Licence), Biological Sciences

Bachelor's Degree (Licence)

2022 - 2025

Work History

B

Beon

AI Evaluation Lead & Prompt Engineer

remote

2024 - Present