For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Mohamed Hesham

Mohamed Hesham

AI Evaluation Specialist - AI Content Quality

EGYPT flag
Sohag, Egypt
$20.00/hrIntermediateScale AI

Key Skills

Software

Scale AIScale AI

Top Subject Matter

No subject matter listed

Top Data Types

TextText
Computer Code ProgrammingComputer Code Programming
AudioAudio
ImageImage

Top Label Types

Evaluation Rating
RLHF
Prompt Response Writing SFT
Data Collection
Audio Recording

Freelancer Overview

I have three years of hands-on experience as an AI Evaluation Specialist, where I focused on enhancing the accuracy, clarity, and reliability of AI-generated content. My work has involved evaluating and analyzing outputs across more than 20 diverse AI projects, with a strong emphasis on quality review, fact checking, and detecting inconsistencies or biases in model responses. I am highly skilled in AI evaluation, content quality assurance, and critical thinking, particularly in Arabic language data. My background allows me to blend editorial expertise with technical understanding, ensuring that training data is precise, relevant, and user-focused. I am comfortable working remotely and collaborating with teams to deliver structured feedback that drives continuous improvement in AI systems.

IntermediateEnglishArabicGerman

Labeling Experience

Scale AI

Xylophone Calendar

Scale AIAudioData CollectionAudio Recording
I was recording his demands in my own voice in various locations, mostly outside the house.

I was recording his demands in my own voice in various locations, mostly outside the house.

2025 - 2025
Scale AI

hopper_code_rlhf

Scale AIComputer Code ProgrammingRLHFEvaluation Rating
I would write a complex prompt in a specific programming language so that the model would make at least one error in response. Then I would evaluate the two responses, favor the one closest to being correct, and then correct the errors in it.

I would write a complex prompt in a specific programming language so that the model would make at least one error in response. Then I would evaluate the two responses, favor the one closest to being correct, and then correct the errors in it.

2024 - 2025
Scale AI

languages_preference_ranking_and_rewrites

Scale AITextRLHFEvaluation Rating
I would write complex prompts for specific categories that challenged the model, requiring it to answer incorrectly. Then I would evaluate the responses, identify the best ones, and correct any errors.

I would write complex prompts for specific categories that challenged the model, requiring it to answer incorrectly. Then I would evaluate the responses, identify the best ones, and correct any errors.

2024 - 2024
Scale AI

Goggles Chromolithograph

Scale AITextPrompt Response Writing SFT
I was writing complex claims for specific categories.

I was writing complex claims for specific categories.

2024 - 2024
Scale AI

Onion dancing

Scale AIImageEvaluation RatingData Collection
I was evaluating the output of images from the form, where two images related to the same request were displayed: the original image and an image after the request, and I had to choose whether it was good or bad.

I was evaluating the output of images from the form, where two images related to the same request were displayed: the original image and an image after the request, and I had to choose whether it was good or bad.

2022 - 2024

Education

A

Assiut University

Bachelor of Social Work, Social Work

Bachelor of Social Work
2019 - 2024

Work History

O

Outlier AI

AI Content Evaluator

Sohag
2022 - 2024