For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
I
Iqra Kashif

Iqra Kashif

AI Trainer & LLM Evaluation Specialist | Python & Prompt Engineering Expert

Pakistan flagBahawalpur, Pakistan
$40.00/hrExpertInternal Proprietary Tooling

Key Skills

Software

Internal/Proprietary Tooling

Top Subject Matter

Artificial Intelligence – Large Language Models (LLMs)
Software Engineering – Code Generation & Debugging
AI Safety & Model Evaluation

Top Data Types

TextText
DocumentDocument
Computer Code ProgrammingComputer Code Programming

Top Task Types

RLHFRLHF
Evaluation/RatingEvaluation/Rating
Prompt + Response Writing (SFT)Prompt + Response Writing (SFT)
Function CallingFunction Calling
Red TeamingRed Teaming
Computer Programming/CodingComputer Programming/Coding
Text GenerationText Generation
Text SummarizationText Summarization
Question AnsweringQuestion Answering
Data CollectionData Collection

Freelancer Overview

I am an experienced AI training data specialist with a strong background in data labeling, annotation, and evaluation for large language models (LLMs) across leading organizations such as Amazon, Apple, OpenAI, and MistralAI. My expertise includes curating and benchmarking high-quality datasets, designing taxonomies, crafting robust evaluation tasks, and performing advanced prompt engineering to test and improve model reasoning, accuracy, and real-world applicability. I have led teams in the end-to-end lifecycle of data generation and QA, ensuring consistency and clarity in technical domains like software engineering, NLP, and document understanding. My technical toolkit spans Python, Django, REST APIs, and a variety of data tools, and I am skilled at collaborating across remote, cross-functional teams to deliver impactful training data that drives AI model performance and reliability.

ExpertEnglishUrdu

Labeling Experience

Agentic LLM Evaluation for App Generation

Computer Code ProgrammingComputer Programming Coding
I evaluated LLMs in multi-step application generation tasks, where models built full-stack applications through iterative prompting. I guided the model across multiple turns, testing how well it could build and improve an app over time. I checked the generated code for correctness, structure, and real-world usability. I also created test cases to validate functionality and identified issues in logic, design, and implementation.

I evaluated LLMs in multi-step application generation tasks, where models built full-stack applications through iterative prompting. I guided the model across multiple turns, testing how well it could build and improve an app over time. I checked the generated code for correctness, structure, and real-world usability. I also created test cases to validate functionality and identified issues in logic, design, and implementation.

2025 - Present

LLM Red Teaming & Failure Mode Analysis

TextRed Teaming
I created challenging prompts to test the limits of AI models and find where they fail. I designed edge cases and tricky scenarios, along with detailed test cases including positive, negative, edge, and boundary conditions. I also wrote ideal answers and checked whether the model could meet those expectations. If it failed, I refined the prompts further.

I created challenging prompts to test the limits of AI models and find where they fail. I designed edge cases and tricky scenarios, along with detailed test cases including positive, negative, edge, and boundary conditions. I also wrote ideal answers and checked whether the model could meet those expectations. If it failed, I refined the prompts further.

2025 - 2025

LLM Evaluation & RLHF for AI Systems

TextRLHF
I worked on evaluating AI model responses by comparing multiple outputs for the same prompt and selecting the best one and rate them based on accuracy, instruction-following, reasoning quality, conciseness, and overall usefulness. I regularly identified where models made mistakes, such as incorrect logic or missing details, and provided structured ratings based on clear criteria. This work directly helped improve model performance and overall response quality.

I worked on evaluating AI model responses by comparing multiple outputs for the same prompt and selecting the best one and rate them based on accuracy, instruction-following, reasoning quality, conciseness, and overall usefulness. I regularly identified where models made mistakes, such as incorrect logic or missing details, and provided structured ratings based on clear criteria. This work directly helped improve model performance and overall response quality.

2024 - 2025

Function Calling & Tool-Use Annotation for LLMs

TextFunction Calling
I worked on project where AI models had to choose and use the correct tools or APIs based on a user’s request. I designed trajectories with defined tools and then evaluated whether the model selected the right function and used it properly. Worked on scenarios involving single and multiple function calls, across both single-turn and multi-turn trajectories. Evaluated tool selection accuracy, parameter correctness, and response quality. I focused on ensuring the model understood intent and responded accurately using the available tools.

I worked on project where AI models had to choose and use the correct tools or APIs based on a user’s request. I designed trajectories with defined tools and then evaluated whether the model selected the right function and used it properly. Worked on scenarios involving single and multiple function calls, across both single-turn and multi-turn trajectories. Evaluated tool selection accuracy, parameter correctness, and response quality. I focused on ensuring the model understood intent and responded accurately using the available tools.

2024 - 2024

Prompt Engineering & SFT Data Creation for LLM Training

TextPrompt Response Writing SFT
I created high quality prompts and wrote detailed, accurate responses to help train AI models. This included solving math problems using Python with step-by-step explanations, writing technical answers, debugging code, and summarizing documents. I also worked on tasks like explaining concepts, proofreading, and generating structured outputs. The goal was always to make responses clear, correct, and aligned with real user needs.

I created high quality prompts and wrote detailed, accurate responses to help train AI models. This included solving math problems using Python with step-by-step explanations, writing technical answers, debugging code, and summarizing documents. I also worked on tasks like explaining concepts, proofreading, and generating structured outputs. The goal was always to make responses clear, correct, and aligned with real user needs.

2022 - 2024

Education

T

The University of Faisalabad

Bachelors of Science, Software Engineering

Bachelors of Science
2017 - 2021

Work History

R

Revelo

AI Engineer

Remote
2026 - Present
T

Turing

Python Engineer

Remote
2022 - 2026