For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Arwaz Khan

Arwaz Khan

LLM Evaluation & Annotation Expert

India flagNew Delhi, India
$45.00/hrIntermediateProdigyLabel StudioDoccano

Key Skills

Software

ProdigyProdigy
Label StudioLabel Studio
DoccanoDoccano
LightTagLightTag
Scale AIScale AI

Top Subject Matter

Competitive intelligence monitoring and LLM response quality
Regulatory compliance monitoring
AI outputs for SMEs

Top Data Types

TextText
DocumentDocument

Top Task Types

Classification
Question Answering
Text Summarization

Freelancer Overview

I am an experienced AI Testing and Data Labeling Specialist with a strong background in LLM evaluation, prompt engineering, hallucination detection, bias analysis, and adversarial testing. My expertise includes designing structured annotation workflows, validating AI outputs for accuracy and consistency, and ensuring high‑quality training datasets for machine learning models. Skilled in Python, SQL, LangChain, Hugging Face, Salesforce, and Jira, I bring both technical precision and process optimization to AI training data projects. I have successfully contributed to projects such as Shadow Tracker (AI‑powered competitive intelligence monitoring) and RegRadar (AI‑driven compliance monitoring), where I performed end‑to‑end pipeline validation, error detection, and quality audits. With a proven record of reducing escalations and improving resolution times, I specialize in building reliable, ethically tested, and benchmarked datasets that strengthen AI performance. My goal is to deliver robust, trustworthy training data that empowers scalable and responsible AI systems.

IntermediateEnglishHindi

Labeling Experience

AI Testing & Data Labeling Specialist

TextClassification
Contributed to AI training and evaluation projects including Shadow Tracker (AI‑powered competitive intelligence monitoring) and RegRadar (AI‑driven compliance monitoring). Responsibilities included designing structured annotation workflows, labeling and validating LLM outputs for factual accuracy, detecting hallucinations and biases, and conducting adversarial prompt testing. Leveraged tools such as Python, SQL, LangChain, Hugging Face, Salesforce, and Jira to ensure high‑quality datasets and reliable AI performance. Achieved measurable improvements in resolution times and reduced repeat escalations through rigorous quality audits and dataset curation.

Contributed to AI training and evaluation projects including Shadow Tracker (AI‑powered competitive intelligence monitoring) and RegRadar (AI‑driven compliance monitoring). Responsibilities included designing structured annotation workflows, labeling and validating LLM outputs for factual accuracy, detecting hallucinations and biases, and conducting adversarial prompt testing. Leveraged tools such as Python, SQL, LangChain, Hugging Face, Salesforce, and Jira to ensure high‑quality datasets and reliable AI performance. Achieved measurable improvements in resolution times and reduced repeat escalations through rigorous quality audits and dataset curation.

2025 - Present

Data Labeling & Annotation Analyst

DocumentClassification
Supported AI training initiatives by labeling and annotating customer support transcripts and knowledge base content to improve intent recognition and response accuracy. Conducted sentiment tagging, classification, and error detection to refine datasets for supervised learning. Collaborated with QA teams to validate outputs, reduce bias, and enhance model reliability. Leveraged tools such as Salesforce CRM, Jira, and Python scripts to streamline annotation workflows and ensure high‑quality datasets for AI model training.

Supported AI training initiatives by labeling and annotating customer support transcripts and knowledge base content to improve intent recognition and response accuracy. Conducted sentiment tagging, classification, and error detection to refine datasets for supervised learning. Collaborated with QA teams to validate outputs, reduce bias, and enhance model reliability. Leveraged tools such as Salesforce CRM, Jira, and Python scripts to streamline annotation workflows and ensure high‑quality datasets for AI model training.

2024 - 2025

Education

R

Ramjas College, University of Delhi

Bachelor of Arts, Political Science

Bachelor of Arts
2022 - 2025

Work History

T

Teleperformance

Technical SME / Team Lead

New Delhi
2025 - Present
T

Teleperformance

Technical Support Engineer

New Delhi
2024 - 2025