Muhammad Daniyal Ehsan - AI Evaluator & QA Specialist | LLMs, Code Review,Data Annotation,Translator

Key Skills

Software

Labelbox

LightTag

Scale AI

SuperAnnotate

Other

Appen

Top Subject Matter

No subject matter listed

Top Data Types

Audio

Computer Code Programming

Text

Top Task Types

Computer Programming Coding

Evaluation Rating

Prompt Response Writing SFT

Question Answering

Translation Localization

Freelancer Overview

I’m an experienced AI evaluator and data annotation specialist with over 8 years of hands-on work on platforms like Appen, Crowdgen, and through direct freelance clients on Fiverr and Upwork. My work has focused on training and improving large language models (LLMs), covering tasks like prompt evaluation, instruction tuning, fact-checking, data annotation, and assistant response QA. I specialize in multilingual linguistic tasks, including English ⇄ Urdu, Arabic, and Persian translation, cultural alignment, and sensitivity review. I’ve worked on domain-specific content in medicine, politics, economics, and technical fields, providing high-quality data and structured feedback to optimize model outputs. In addition, I bring expertise in structured code review and explanation, particularly in Python and JavaScript, used for LLM code understanding tasks. My strengths lie in detail-oriented evaluation, technical fluency, and consistent delivery of reliable data for AI system improvement across both text and code domains.

ExpertUrduHindiArabicBengaliEnglishPunjabiPersian Farsi

Labeling Experience

Multilingual Translation & QA Annotation

AppenTextTranslation Localization

Labeled machine translations across Urdu, Arabic, Persian, and English pairs. Evaluated semantic accuracy, grammar, and cultural alignment. Flagged low-quality segments and annotated corrections in domain-specific content (e.g., health and politics).

2018

Code Instruction Annotation & QA – LLM Support

AppenTextQuestion AnsweringEvaluation Rating

Worked on Appen projects focused on annotating code snippets and labeling matching instructions. Tasks included evaluating AI-generated programming answers, verifying correctness, tagging function behaviors, and assessing the clarity of natural language explanations tied to code. Helped fine-tune code-aware AI systems with structured feedback and quality tags.

2023 - 2024

Multilingual Audio Segmentation & Tagging

AppenAudioFine TuningEvaluation Rating

Labeled and segmented audio clips in Urdu and Arabic for speech-to-text model training. Tasks included identifying speaker changes, removing background noise segments, tagging utterances, and labeling unclear or overlapping speech. Ensured time-stamped accuracy for proper training of automatic speech recognition (ASR) systems.

2022 - 2023

Prompt Evaluation & AI Response Rating

AppenTextClassificationText Generation

Performed high-volume evaluations of AI-generated responses to user prompts. Assessed output based on correctness, coherence, tone, bias, and helpfulness. Applied structured rating criteria and provided written feedback to fine-tune LLM performance.

2021 - 2023

Education

A

Al Noor University, United Arab Emirates

Master of Science in Computer Science, Computer Science

Master of Science in Computer Science

2023 - 2025

A

Al Noor University, United Arab Emirates

Bachelor of Science in Computer Science, Computer Science

Bachelor of Science in Computer Science

2019 - 2023

Work History

M

Microsoft UHRS

Content Quality Rater / Web Search Evaluator

Remote

2021 - Present

A

Appen Inc

AI Evaluator & QA Specialist

Remote

2017 - Present