For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Jibran Khan

Jibran Khan

LLM Evaluation Specialist in Python & Java Programming

Canada flagOshawa, Canada
$40.00/hrIntermediateScale AI

Key Skills

Software

Scale AIScale AI

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code ProgrammingComputer Code Programming
ImageImage
TextText

Top Task Types

Computer Programming Coding
Evaluation Rating
Prompt Response Writing SFT
RLHF
Translation Localization

Freelancer Overview

With a year of hands-on experience at Outlier.ai, I have specialized in training reasoning models for mathematics and, more recently, large language models (LLMs) focused on coding tasks. My work has spanned a diverse range of projects, including collaborating with TopCoder to transform complex algorithmic problems into accessible, story-driven formats, and developing assertion tests to ensure output quality and accuracy. Throughout these roles, I have consistently adapted to new project requirements by thoroughly studying onboarding materials and applying advanced techniques such as Reinforcement Learning from Human Feedback (RLHF), Supervised Fine-Tuning (SFT), response comparison, and evaluation. I hold platform certifications for and coding projects, having passed skill assessments in Java, Python, JavaScript, SQL, C++, Clojure, React, and R. My strongest technical skills lie in Python and its data science ecosystem, as well as the JVM stack through Java, Clojure, and Kotlin. Professionally, I excel at training LLMs using a variety of methodologies, quickly mastering new workflows, and delivering high-quality, reliable training data for AI systems.

IntermediateFrenchPashtoEnglish

Labeling Experience

Scale AI

Code Generation and Evaluation for LLM Training

Scale AIComputer Code ProgrammingClassificationRLHF
I was involved in training and evaluating LLMs for programming across 7 task types (e.g., code generation, code review, debugging, Test Case generation, documentation, and refactoring) and many different categories (including but not limited to front-end, back-end, data-engineering, data visualization, DB management, scripting, algorithms, etc..). Responsibilities included: Generating prompts tailored to specific task types and categories (e.g., Code generation make my x, or here's my code I have this error please debug). Performing domain classification to validate if prewritten prompts aligned with target use cases. Evaluating model responses on dimensions like correctness, clarity, efficiency, readability, and adherence to specifications, with criteria varying by category (e.g., database management would include query optimization).

I was involved in training and evaluating LLMs for programming across 7 task types (e.g., code generation, code review, debugging, Test Case generation, documentation, and refactoring) and many different categories (including but not limited to front-end, back-end, data-engineering, data visualization, DB management, scripting, algorithms, etc..). Responsibilities included: Generating prompts tailored to specific task types and categories (e.g., Code generation make my x, or here's my code I have this error please debug). Performing domain classification to validate if prewritten prompts aligned with target use cases. Evaluating model responses on dimensions like correctness, clarity, efficiency, readability, and adherence to specifications, with criteria varying by category (e.g., database management would include query optimization).

2024

Education

U

University of Ontario Institute of Technology

Bachelor of Science, Computer Science

Bachelor of Science
2022

Work History

W

Wouessi

DevOps & Front-end Developer

Remote
2025 - 2025
W

Wennovate Consulting

Full-Stack Developer

Remote
2025 - 2025