For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Muhammad Daniyal Ehsan

Muhammad Daniyal Ehsan

AI Evaluator & QA Specialist | LLMs, Code Review,Data Annotation,Translator

Pakistan flagRawalpindi, Pakistan
$9.00/hrExpertLabelboxLighttagScale AI

Key Skills

Software

LabelboxLabelbox
LightTagLightTag
Scale AIScale AI
SuperAnnotateSuperAnnotate
Other
AppenAppen

Top Subject Matter

No subject matter listed

Top Data Types

AudioAudio
Computer Code ProgrammingComputer Code Programming
TextText

Top Task Types

Computer Programming Coding
Evaluation Rating
Prompt Response Writing SFT
Question Answering
Translation Localization

Freelancer Overview

I’m an experienced AI evaluator and data annotation specialist with over 8 years of hands-on work on platforms like Appen, Crowdgen, and through direct freelance clients on Fiverr and Upwork. My work has focused on training and improving large language models (LLMs), covering tasks like prompt evaluation, instruction tuning, fact-checking, data annotation, and assistant response QA. I specialize in multilingual linguistic tasks, including English ⇄ Urdu, Arabic, and Persian translation, cultural alignment, and sensitivity review. I’ve worked on domain-specific content in medicine, politics, economics, and technical fields, providing high-quality data and structured feedback to optimize model outputs. In addition, I bring expertise in structured code review and explanation, particularly in Python and JavaScript, used for LLM code understanding tasks. My strengths lie in detail-oriented evaluation, technical fluency, and consistent delivery of reliable data for AI system improvement across both text and code domains.

ExpertUrduHindiArabicBengaliEnglishPunjabiPersian Farsi

Labeling Experience

Appen

Multilingual Translation & QA Annotation

AppenTextTranslation Localization
Labeled machine translations across Urdu, Arabic, Persian, and English pairs. Evaluated semantic accuracy, grammar, and cultural alignment. Flagged low-quality segments and annotated corrections in domain-specific content (e.g., health and politics).

Labeled machine translations across Urdu, Arabic, Persian, and English pairs. Evaluated semantic accuracy, grammar, and cultural alignment. Flagged low-quality segments and annotated corrections in domain-specific content (e.g., health and politics).

2018
Appen

Code Instruction Annotation & QA – LLM Support

AppenTextQuestion AnsweringEvaluation Rating
Worked on Appen projects focused on annotating code snippets and labeling matching instructions. Tasks included evaluating AI-generated programming answers, verifying correctness, tagging function behaviors, and assessing the clarity of natural language explanations tied to code. Helped fine-tune code-aware AI systems with structured feedback and quality tags.

Worked on Appen projects focused on annotating code snippets and labeling matching instructions. Tasks included evaluating AI-generated programming answers, verifying correctness, tagging function behaviors, and assessing the clarity of natural language explanations tied to code. Helped fine-tune code-aware AI systems with structured feedback and quality tags.

2023 - 2024
Appen

Multilingual Audio Segmentation & Tagging

AppenAudioFine TuningEvaluation Rating
Labeled and segmented audio clips in Urdu and Arabic for speech-to-text model training. Tasks included identifying speaker changes, removing background noise segments, tagging utterances, and labeling unclear or overlapping speech. Ensured time-stamped accuracy for proper training of automatic speech recognition (ASR) systems.

Labeled and segmented audio clips in Urdu and Arabic for speech-to-text model training. Tasks included identifying speaker changes, removing background noise segments, tagging utterances, and labeling unclear or overlapping speech. Ensured time-stamped accuracy for proper training of automatic speech recognition (ASR) systems.

2022 - 2023
Appen

Prompt Evaluation & AI Response Rating

AppenTextClassificationText Generation
Performed high-volume evaluations of AI-generated responses to user prompts. Assessed output based on correctness, coherence, tone, bias, and helpfulness. Applied structured rating criteria and provided written feedback to fine-tune LLM performance.

Performed high-volume evaluations of AI-generated responses to user prompts. Assessed output based on correctness, coherence, tone, bias, and helpfulness. Applied structured rating criteria and provided written feedback to fine-tune LLM performance.

2021 - 2023

Education

A

Al Noor University, United Arab Emirates

Master of Science in Computer Science, Computer Science

Master of Science in Computer Science
2023 - 2025
A

Al Noor University, United Arab Emirates

Bachelor of Science in Computer Science, Computer Science

Bachelor of Science in Computer Science
2019 - 2023

Work History

M

Microsoft UHRS

Content Quality Rater / Web Search Evaluator

Remote
2021 - Present
A

Appen Inc

AI Evaluator & QA Specialist

Remote
2017 - Present