Kazem Hamza - AI Conversation Evaluator - Conversational AI

Key Skills

Software

Labelbox

Don't disclose

Top Subject Matter

No subject matter listed

Top Data Types

Audio

Text

Top Label Types

Classification

Transcription

Relationship

Question Answering

Emotion Recognition

Evaluation Rating

Audio Recording

Text Generation

RLHF

Prompt Response Writing SFT

Freelancer Overview

I have hands-on experience in AI training data creation, labeling, and human evaluation across multiple real-world projects involving speech, audio, and large language models. At Alignerr, I worked on Project Vera – VAD Audio Annotation v2, producing millisecond-level (1 ms) word-aligned transcriptions, labeling speech types (standard speech, acknowledgments, interruptions), and annotating background noise events. I also contributed to Project Human Evals v2.3, where I conducted live, multi-turn audio conversations with AI models in noisy environments to evaluate speech recognition, responsiveness, task completion, context awareness, and emotional empathy in both customer support and companionship scenarios. In addition, through Cypher RLHF at Outlier, I supported LLM improvement via human feedback by writing prompts and performing pairwise evaluations of AI-generated responses. I assessed outputs for truthfulness, clarity, grammatical quality, text structure, relevance, and alignment with user intent, providing consistent, high-quality annotations used for model fine-tuning. Together, these projects demonstrate strong skills in attention to detail, critical evaluation, conversational AI understanding, RLHF workflows, and human-in-the-loop training, with a proven ability to generate high-quality data that improves AI accuracy, reliability, and real-world performance.

Entry LevelEnglishArabic

Labeling Experience

AI Data Annotator & Evaluator

Don T DiscloseTextText GenerationRLHF

Wrote prompts and performed pairwise evaluation of AI-generated responses as part of an RLHF workflow. Compared outputs based on truthfulness, clarity, grammar, text structure, relevance, and overall quality, providing human feedback used to improve large language model accuracy, readability, and alignment with user intent.

2025 - 2025

AI Data Annotator & Evaluator

LabelboxAudioRelationshipQuestion Answering

Conducted live multi-turn audio conversations with AI models in noisy environments to evaluate speech responsiveness, task completion, context awareness, and emotional empathy. Assessed model performance in customer support scenarios (e.g., TV and internet troubleshooting) and companionship interactions to help improve real-world reliability and human-like behavior.

2025 - 2025

LLM / Conversational AI Evaluator

LabelboxAudioClassificationTranscription

Performed millisecond-level audio transcription and annotation of user–AI conversations. Labeled each spoken word by speech type (standard speech, acknowledgments, interruptions) and annotated background noise events with precise timestamps to support improvements in speech recognition, turn-taking, and noise robustness for conversational AI models.

2025 - 2025

Education

H

Helwan University

Bachelor of Science, Electrical Engineering

Bachelor of Science

2019 - 2024

Work History

C

Contractor

Electrical site engineer trainee

Badr city

2023 - 2023