For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Irfan Khan

Irfan Khan

Expert in prompt engineering & data labeling in STEM

Australia flagSydney, Australia
$60.00/hrIntermediateAppenLabelboxMindrift

Key Skills

Software

AppenAppen
LabelboxLabelbox
MindriftMindrift
Other

Top Subject Matter

No subject matter listed

Top Data Types

DocumentDocument
ImageImage
TextText

Top Task Types

Classification
Evaluation Rating
Object Detection
RLHF
Text Generation

Freelancer Overview

Over the past two years, I’ve contributed to a wide range of high-impact AI training projects across Outlier, Alignerr (Labelbox), and Appen – Crowdgen. My experience includes writing prompts, crafting rubrics, evaluating model responses, and conducting fine-grained data annotation across domains like biology, chemistry, physics, coding, and general reasoning. Notable Outlier projects include Iris Gambit (MCQ generation and rubric writing), Phoenix (rubric evaluation across subdomains), Cracked Vault (prompt writing from research abstracts), and Thales Tales (chain-of-thought model rewrites). My work has consistently prioritized MECE-compliant criteria, formatting accuracy, and deep model understanding. In addition, I’ve contributed to TTS and function-calling evaluations in Alignerr (Labelbox) and conversational response projects like Spearmint and Coffee at Appen. My strengths include precision prompt engineering, GTFA-based reasoning analysis, and clear, guideline-aligned labeling. I’m adept at identifying edge cases and model failure modes, and ensuring consistently high data quality across complex and evolving instructions. This mix of creativity, analytical thinking, and instruction adherence makes me an effective and versatile contributor to any AI training pipeline.

IntermediateUrduHindiFrenchEnglishPunjabi

Labeling Experience

Labelbox

Function Calling – Image Annotation & Validation (Alignerr)

LabelboxVideoFunction Calling
Annotated images for function-calling tasks by linking visual elements to textual function prompts, ensuring alignment between model outputs and visual context. Evaluated correctness of function calls generated by the model (e.g., coordinates, labels, object presence), validated API-ready responses, and corrected function arguments to match visual content. Adhered to detailed client rubrics for visual accuracy, clarity, and API syntax.

Annotated images for function-calling tasks by linking visual elements to textual function prompts, ensuring alignment between model outputs and visual context. Evaluated correctness of function calls generated by the model (e.g., coordinates, labels, object presence), validated API-ready responses, and corrected function arguments to match visual content. Adhered to detailed client rubrics for visual accuracy, clarity, and API syntax.

2025 - 2025
Appen

CrowdGEN – Research Q&A and Distractor Writing (Appen)

AppenTextQuestion Answering
Created real-world event-based MCQs with correct insights and plausible distractors. Topics covered included global markets, regulatory decisions, and corporate performance. Followed a structured rubric to ensure clarity, neutrality, and informative value across hundreds of items.

Created real-world event-based MCQs with correct insights and plausible distractors. Topics covered included global markets, regulatory decisions, and corporate performance. Followed a structured rubric to ensure clarity, neutrality, and informative value across hundreds of items.

2025 - 2025

LLM Prompt Evaluation – Outlier (Phoenix Project)

OtherTextClassification
Evaluated and rated over 1,000 LLM-generated responses using detailed quality rubrics across physics, engineering, and general domains. Identified factual inconsistencies, ambiguity, and formatting issues. Suggested improved prompts/responses and conducted final quality audits as part of gold-standard calibration sets.

Evaluated and rated over 1,000 LLM-generated responses using detailed quality rubrics across physics, engineering, and general domains. Identified factual inconsistencies, ambiguity, and formatting issues. Suggested improved prompts/responses and conducted final quality audits as part of gold-standard calibration sets.

2025 - 2025

Education

U

University of Engineering & Technology, Lahore

Master of Science, Polymer & Process Engineering

Master of Science
2020 - 2022
U

Université de Montpellier; VSCHT

Erasmus Mundus Master, Membrane Engineering for Sustainable World

Erasmus Mundus Master
2018 - 2019

Work History

A

ALignerr

LLM Data Annotation

Sydney
2025 - Present
A

Appen - Crowdgen

AI Content Evaluator

Sydney
2025 - Present