For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Abdallah Ammar

Abdallah Ammar

LLM Evaluation and programming Specialist in English & Arabic.

USA flagNew York, Usa
$30.00/hrExpertAppenCrowdflowerCVAT

Key Skills

Software

AppenAppen
CrowdFlowerCrowdFlower
CVATCVAT
Data Annotation TechData Annotation Tech
LabelboxLabelbox
OneFormaOneForma
OpenCV AI Kit (OAK)OpenCV AI Kit (OAK)
ProdigyProdigy
RoboflowRoboflow
Scale AIScale AI
TolokaToloka
TelusTelus
Surge AISurge AI
Label StudioLabel Studio

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code ProgrammingComputer Code Programming
DocumentDocument
TextText

Top Task Types

Computer Programming Coding
Evaluation Rating
RLHF
Text Generation
Translation Localization

Freelancer Overview

AI Data Specialist with a PhD in Computer Science from Stanford University and extensive experience in data labeling, AI model training, and quality evaluation. I've annotated and optimized diverse datasets (text, image, audio) for NLP, speech recognition, and search algorithms at companies like DataTech Solutions and BrightMind AI. My expertise includes designing scalable labeling workflows, implementing rigorous QA checks, and collaborating with ML teams to enhance model performance—ensuring data quality aligns with human values. What sets me apart is my research-backed approach to reducing labeling errors by 30% through improved guidelines, hands-on experience with end-to-end data pipelines, and certifications in machine learning (Google, Microsoft). My work directly contributes to building safer, more accurate AI systems—perfectly aligning with Surge AI's mission.

ExpertArabicGermanEnglish

Labeling Experience

Scale AI

Radiology Image Segmentation for Diagnostic AI

Scale AIImageSegmentationCuboid
Annotated 15,000+ CT/MRI scans to train a cancer detection model. Tasks included: Precise tumor segmentation with radiologist verification Classifying malignancy likelihood (Benign/Suspicious/Malignant) Measuring lesion dimensions via cuboid annotation Achieved 98% concordance with expert radiologists on test set

Annotated 15,000+ CT/MRI scans to train a cancer detection model. Tasks included: Precise tumor segmentation with radiologist verification Classifying malignancy likelihood (Benign/Suspicious/Malignant) Measuring lesion dimensions via cuboid annotation Achieved 98% concordance with expert radiologists on test set

2024 - 2024
CVAT

Street Scene Annotation for Self-Driving AI

CVATImageBounding BoxPolygon
Led a team labeling 100,000+ street view images for an autonomous vehicle startup. Tasks included: Drawing precise bounding boxes around vehicles/pedestrians Semantic segmentation of road surfaces and lanes Quality control through consensus scoring (3 raters per image) Developed guidelines for edge cases (occluded objects, poor lighting)

Led a team labeling 100,000+ street view images for an autonomous vehicle startup. Tasks included: Drawing precise bounding boxes around vehicles/pedestrians Semantic segmentation of road surfaces and lanes Quality control through consensus scoring (3 raters per image) Developed guidelines for edge cases (occluded objects, poor lighting)

2023 - 2023
Scale AI

Multimodal NLP Dataset Annotation for LLM Fine-Tuning

Scale AIAudioEntity Ner ClassificationQuestion Answering
Led annotation of 50,000+ text and audio samples (patient queries/doctor responses) to fine-tune a healthcare LLM. Tasks included: Labeling medical entities (symptoms, medications) with 95% inter-rater agreement. Classifying intent (e.g., "diagnosis request" vs. "treatment advice") and emotion tones (urgency, concern). Writing synthetic prompt-response pairs to fill data gaps. Implementing QA checks: 10% random reviews by senior raters + automated consistency validation (Python scripts).

Led annotation of 50,000+ text and audio samples (patient queries/doctor responses) to fine-tune a healthcare LLM. Tasks included: Labeling medical entities (symptoms, medications) with 95% inter-rater agreement. Classifying intent (e.g., "diagnosis request" vs. "treatment advice") and emotion tones (urgency, concern). Writing synthetic prompt-response pairs to fill data gaps. Implementing QA checks: 10% random reviews by senior raters + automated consistency validation (Python scripts).

2023 - 2023
Surge AI

Search Relevance Rating for AI-Powered Engine

Surge AITextClassificationEvaluation Rating
Rated 20,000+ search query-result pairs for a major tech client, achieving 98% alignment with gold-standard benchmarks. Developed ambiguity-resolution guidelines adopted company-wide.

Rated 20,000+ search query-result pairs for a major tech client, achieving 98% alignment with gold-standard benchmarks. Developed ambiguity-resolution guidelines adopted company-wide.

2022 - 2023
Label Studio

Image Annotation

Label StudioImage
Used Label Studio for 6 months to annotate image data for OCR tasks, including drawing precise bounding boxes and polygons for text extraction.

Used Label Studio for 6 months to annotate image data for OCR tasks, including drawing precise bounding boxes and polygons for text extraction.

2022 - 2022

Education

S

Stanford University

PhD, Computer Science

PhD
2020 - 2024
S

Stanford University

Bachelor, Computer Science

Bachelor
2016 - 2020

Work History

B

BrightMind AI

Data Collector & AI Model Trainer

San Francisco
2022 - 2023
D

DeepSearch Analytics

Search Relevance Analyst

San Francisco Bay Area
2021 - 2022