Arwaz Khan - LLM Evaluation & Annotation Expert

Key Skills

Software

Prodigy

Label Studio

Doccano

LightTag

Scale AI

Top Subject Matter

Competitive intelligence monitoring and LLM response quality

Regulatory compliance monitoring

AI outputs for SMEs

Top Data Types

Text

Document

Top Task Types

Classification

Question Answering

Text Summarization

Freelancer Overview

I am an experienced AI Testing and Data Labeling Specialist with a strong background in LLM evaluation, prompt engineering, hallucination detection, bias analysis, and adversarial testing. My expertise includes designing structured annotation workflows, validating AI outputs for accuracy and consistency, and ensuring high‑quality training datasets for machine learning models. Skilled in Python, SQL, LangChain, Hugging Face, Salesforce, and Jira, I bring both technical precision and process optimization to AI training data projects. I have successfully contributed to projects such as Shadow Tracker (AI‑powered competitive intelligence monitoring) and RegRadar (AI‑driven compliance monitoring), where I performed end‑to‑end pipeline validation, error detection, and quality audits. With a proven record of reducing escalations and improving resolution times, I specialize in building reliable, ethically tested, and benchmarked datasets that strengthen AI performance. My goal is to deliver robust, trustworthy training data that empowers scalable and responsible AI systems.

IntermediateEnglishHindi

Labeling Experience

AI Testing & Data Labeling Specialist

TextClassification

Contributed to AI training and evaluation projects including Shadow Tracker (AI‑powered competitive intelligence monitoring) and RegRadar (AI‑driven compliance monitoring). Responsibilities included designing structured annotation workflows, labeling and validating LLM outputs for factual accuracy, detecting hallucinations and biases, and conducting adversarial prompt testing. Leveraged tools such as Python, SQL, LangChain, Hugging Face, Salesforce, and Jira to ensure high‑quality datasets and reliable AI performance. Achieved measurable improvements in resolution times and reduced repeat escalations through rigorous quality audits and dataset curation.

2025 - Present

Data Labeling & Annotation Analyst

DocumentClassification

Supported AI training initiatives by labeling and annotating customer support transcripts and knowledge base content to improve intent recognition and response accuracy. Conducted sentiment tagging, classification, and error detection to refine datasets for supervised learning. Collaborated with QA teams to validate outputs, reduce bias, and enhance model reliability. Leveraged tools such as Salesforce CRM, Jira, and Python scripts to streamline annotation workflows and ensure high‑quality datasets for AI model training.

2024 - 2025

Education

R

Ramjas College, University of Delhi

Bachelor of Arts, Political Science

Bachelor of Arts

2022 - 2025

Work History

T

Teleperformance

Technical SME / Team Lead

New Delhi

2025 - Present

T

Teleperformance

Technical Support Engineer

New Delhi

2024 - 2025