Mohamed Hesham - AI Evaluation Specialist - AI Content Quality

Key Skills

Software

Scale AI

Top Subject Matter

No subject matter listed

Top Data Types

Text

Computer Code Programming

Audio

Image

Top Label Types

Evaluation Rating

RLHF

Prompt Response Writing SFT

Data Collection

Audio Recording

Freelancer Overview

I have three years of hands-on experience as an AI Evaluation Specialist, where I focused on enhancing the accuracy, clarity, and reliability of AI-generated content. My work has involved evaluating and analyzing outputs across more than 20 diverse AI projects, with a strong emphasis on quality review, fact checking, and detecting inconsistencies or biases in model responses. I am highly skilled in AI evaluation, content quality assurance, and critical thinking, particularly in Arabic language data. My background allows me to blend editorial expertise with technical understanding, ensuring that training data is precise, relevant, and user-focused. I am comfortable working remotely and collaborating with teams to deliver structured feedback that drives continuous improvement in AI systems.

IntermediateEnglishArabicGerman

Labeling Experience

Xylophone Calendar

Scale AIAudioData CollectionAudio Recording

I was recording his demands in my own voice in various locations, mostly outside the house.

2025 - 2025

hopper_code_rlhf

Scale AIComputer Code ProgrammingRLHFEvaluation Rating

I would write a complex prompt in a specific programming language so that the model would make at least one error in response. Then I would evaluate the two responses, favor the one closest to being correct, and then correct the errors in it.

2024 - 2025

languages_preference_ranking_and_rewrites

Scale AITextRLHFEvaluation Rating

I would write complex prompts for specific categories that challenged the model, requiring it to answer incorrectly. Then I would evaluate the responses, identify the best ones, and correct any errors.

2024 - 2024

Goggles Chromolithograph

Scale AITextPrompt Response Writing SFT

I was writing complex claims for specific categories.

2024 - 2024

Onion dancing

Scale AIImageEvaluation RatingData Collection

I was evaluating the output of images from the form, where two images related to the same request were displayed: the original image and an image after the request, and I had to choose whether it was good or bad.

2022 - 2024

Education

A

Assiut University

Bachelor of Social Work, Social Work

Bachelor of Social Work

2019 - 2024

Work History

O

Outlier AI

AI Content Evaluator

Sohag

2022 - 2024