For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Gaurav Kumar

Gaurav Kumar

Expert in AI, data science prompt creation, LLM evaula

India flagTirupati, India
$30.00/hrEntry LevelMercorScale AIInternal Proprietary Tooling

Key Skills

Software

MercorMercor
Scale AIScale AI
Internal/Proprietary Tooling

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code ProgrammingComputer Code Programming
ImageImage
TextText

Top Task Types

Computer Programming Coding
Evaluation Rating
Prompt Response Writing SFT
Text Generation

Freelancer Overview

I am an experienced data annotator and computer scientist with a strong background in AI training data, having worked with leading platforms such as Scale AI, Turing, and Mercor. My work spans prompt writing for LLMs, image annotation, coding task evaluation, and rubric generation, ensuring high-quality data for advanced machine learning models. I have hands-on experience in domains like natural language processing, computer vision, healthcare AI, and e-commerce, including developing translation portals and medical diagnostic tools using federated learning. My technical proficiency includes Python, C/C++, TensorFlow, and various development environments, and I am adept at designing and managing data pipelines for large-scale AI projects. I am passionate about contributing to innovative AI solutions by delivering precise, reliable, and well-structured training data.

Entry LevelHindiEnglish

Labeling Experience

Scale AI

Human Lever

Scale AITextQuestion AnsweringEvaluation Rating
Worked on Humanity’s Last Exam Dataset project, generated high-quality dataset designed to evaluate the reasoning, comprehension, and problem-solving abilities of LLMs. The project involved creating diverse and challenging prompts across multiple domains, including logic, mathematics, ethics, hypothetical scenarios, and high-stakes decision—making, intended to test LLM performance under complex, human-level cognitive conditions. Data generation included constructing question–answer pairs, multi-step reasoning tasks, edge-case scenarios, and adversarial examples to benchmark model robustness. Strict quality control procedures were applied, such as manual validation, consistency checks, and automated filtering to ensure difficulty, clarity, and originality. The resulting dataset provides a comprehensive and demanding evaluation framework aimed at pushing the boundaries of LLM understanding and reasoning capabilities.

Worked on Humanity’s Last Exam Dataset project, generated high-quality dataset designed to evaluate the reasoning, comprehension, and problem-solving abilities of LLMs. The project involved creating diverse and challenging prompts across multiple domains, including logic, mathematics, ethics, hypothetical scenarios, and high-stakes decision—making, intended to test LLM performance under complex, human-level cognitive conditions. Data generation included constructing question–answer pairs, multi-step reasoning tasks, edge-case scenarios, and adversarial examples to benchmark model robustness. Strict quality control procedures were applied, such as manual validation, consistency checks, and automated filtering to ensure difficulty, clarity, and originality. The resulting dataset provides a comprehensive and demanding evaluation framework aimed at pushing the boundaries of LLM understanding and reasoning capabilities.

2025 - 2025

LLM Orchestration and Evaluation

Internal Proprietary ToolingTextQuestion AnsweringText Generation
Developed an AI-driven orchestration and evaluation pipeline that uses tailored prompts to trigger automated tool calls and assess LLM responses for accuracy. The system integrates with multiple external tools and services, including Amazon API for product and data retrieval, Slack API for communication and workflow automation, Reddit API for content extraction and analysis, and database tools for structured data storage, querying, and validation. The workflow supports accept/reject decisioning based on predefined quality criteria, enabling a robust, scalable solution for end-to-end LLM response evaluation and automated task execution.

Developed an AI-driven orchestration and evaluation pipeline that uses tailored prompts to trigger automated tool calls and assess LLM responses for accuracy. The system integrates with multiple external tools and services, including Amazon API for product and data retrieval, Slack API for communication and workflow automation, Reddit API for content extraction and analysis, and database tools for structured data storage, querying, and validation. The workflow supports accept/reject decisioning based on predefined quality criteria, enabling a robust, scalable solution for end-to-end LLM response evaluation and automated task execution.

2025 - 2025
Scale AI

Data Analysis Prompt and Code Writing

Scale AIComputer Code ProgrammingPrompt Response Writing SFT
Developed more than 100 data-analysis prompts and a complete Python pipeline to process and interpret multiple datasets. The project involved writing complex and high-quality data-analysis prompts capable of handling different analytical scenarios—ranging from exploratory analysis to statistical modeling, data cleaning, visualizations, and pattern detection. Handled diverse .csv data and automating end-to-end workflows. Throughout the project, I adhered to strict quality measures, including validation checks, schema verification, reproducible code structure, detailed rubrics generation, and performance-optimized data processing to ensure accuracy, consistency, and reliability.

Developed more than 100 data-analysis prompts and a complete Python pipeline to process and interpret multiple datasets. The project involved writing complex and high-quality data-analysis prompts capable of handling different analytical scenarios—ranging from exploratory analysis to statistical modeling, data cleaning, visualizations, and pattern detection. Handled diverse .csv data and automating end-to-end workflows. Throughout the project, I adhered to strict quality measures, including validation checks, schema verification, reproducible code structure, detailed rubrics generation, and performance-optimized data processing to ensure accuracy, consistency, and reliability.

2025 - 2025

Education

N

National Institute of Technology, Kurukshetra

Master of Technology, Computer Science and Engineering

Master of Technology
2015 - 2017
G

Gautam Buddha Technical University, Lucknow

Bachelor of Technology, Computer Science and Engineering

Bachelor of Technology
2008 - 2012

Work History

N

National Atmospheric Research Laboratory

Computer Scientist

N/A
2023 - Present
S

Sony Research

Research Intern

N/A
2023 - 2023