For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
N
Navpreet Singh Devpuri

Navpreet Singh Devpuri

Lead LLM Engineer | Code RLHF, Agent Simulation & Evaluation Tooling for Frontier LLMs

India flagN/A, India
$60.00/hrExpertOther

Key Skills

Software

Other

Top Subject Matter

LLM Training Data (RLHF & SFT)
AI Agent Evaluation & Benchmarking
Code, Programming & Function-Calling Tasks

Top Data Types

Computer Code ProgrammingComputer Code Programming
DocumentDocument

Top Task Types

RLHFRLHF
Fine-tuningFine-tuning
Computer Programming/CodingComputer Programming/Coding
Function CallingFunction Calling
Evaluation/RatingEvaluation/Rating

Freelancer Overview

Lead LLM Engineer at Turing with nearly 2 years of focused experience in data labeling, training-data generation, and expert evaluation for frontier model development, directly contributing to Gemini and Claude across Python, JavaScript, MongoDB, SQL, and PostgreSQL. Promoted to Reviewer in 2 weeks (top 3 of 100+) and Lead Reviewer in 4 weeks, sustaining a 0% audit error rate against a 21.97% team average, and leading QA for a 300+ member labeling team to ensure consistent, high-signal model evaluations on real-world tasks sourced from live open-source GitHub repositories. Beyond labeling, I design the proprietary tooling that powers AI training at scale: 60+ simulated API services (Gmail, Slack, WhatsApp, Google Maps, etc.) for agent training and evaluation, Gemini Gym's Mutation Engine generating 1M+ config-driven tool variations with auto-generated function-calling schemas, and RL frameworks for on-device agents with 80+ Android (Kotlin) and 30+ iOS (Swift) tools built from the ground up. Backed by 8+ years of engineering across FDA-compliant healthcare (Biofourmis) and security/red-teaming (Aspirify), I bring deep full-stack, evaluation-infrastructure, and quality-control rigor to complex labeling workflows.

ExpertEnglishHindiPunjabi

Labeling Experience

Lead: Agent Simulation Environments, Training Data & Benchmarks for LLM Agents

Computer Code ProgrammingComputer Programming Coding
At Turing I lead a team building large-scale simulation and scenario environments that produce training and evaluation data for LLM agents, along with the benchmarks used to score agent performance. The work spans environment design, scenario authoring, scripted ground-truth generation, automated grading, and end-to-end evaluation infrastructure. Key contributions include 60+ proprietary simulated API services (Gmail, Slack, WhatsApp, Google Maps, etc.) built from scratch to give agents grounded, multi-step tool-use scenarios; Gemini Gym's Mutation Engine, a config-driven generator producing 1M+ tool variations with auto-generated function-calling schemas for diverse training coverage; on-device RL frameworks integrating 80+ Android tools (Kotlin) and 30+ iOS tools (Swift) built from the ground up; and benchmarking infrastructure with comprehensive test coverage that scores cloud AI agents on real-world tasks sourced from live open-source GitHub repositories.

At Turing I lead a team building large-scale simulation and scenario environments that produce training and evaluation data for LLM agents, along with the benchmarks used to score agent performance. The work spans environment design, scenario authoring, scripted ground-truth generation, automated grading, and end-to-end evaluation infrastructure. Key contributions include 60+ proprietary simulated API services (Gmail, Slack, WhatsApp, Google Maps, etc.) built from scratch to give agents grounded, multi-step tool-use scenarios; Gemini Gym's Mutation Engine, a config-driven generator producing 1M+ tool variations with auto-generated function-calling schemas for diverse training coverage; on-device RL frameworks integrating 80+ Android tools (Kotlin) and 30+ iOS tools (Swift) built from the ground up; and benchmarking infrastructure with comprehensive test coverage that scores cloud AI agents on real-world tasks sourced from live open-source GitHub repositories.

2024 - Present

Lead Reviewer: RLHF Code Evaluation for Gemini and Claude (300+ Team QA)

Computer Code ProgrammingRLHF
As Lead LLM Engineer at Turing, I lead RLHF data generation and expert evaluation for frontier LLMs (Gemini and Claude) on real-world coding tasks across Python, JavaScript, MongoDB, SQL, PostgreSQL, Kotlin, and Swift. I author and review reward signals on tasks sourced directly from live open-source GitHub repositories, write rubric-grounded critiques and preference data, and ensure consistent rubric application across a 300+ member labeling team. Within 2 weeks I was promoted to Reviewer (top 3 of 100+) and within 4 weeks to Lead Reviewer, sustaining a 0% audit error rate against a 21.97% team average. As Lead Reviewer I own QA for the labeling pipeline: calibrating reviewers, resolving rubric ambiguities, escalating systematic failure modes back to model and rubric owners, and approving the final preference dataset that flows into training.

As Lead LLM Engineer at Turing, I lead RLHF data generation and expert evaluation for frontier LLMs (Gemini and Claude) on real-world coding tasks across Python, JavaScript, MongoDB, SQL, PostgreSQL, Kotlin, and Swift. I author and review reward signals on tasks sourced directly from live open-source GitHub repositories, write rubric-grounded critiques and preference data, and ensure consistent rubric application across a 300+ member labeling team. Within 2 weeks I was promoted to Reviewer (top 3 of 100+) and within 4 weeks to Lead Reviewer, sustaining a 0% audit error rate against a 21.97% team average. As Lead Reviewer I own QA for the labeling pipeline: calibrating reviewers, resolving rubric ambiguities, escalating systematic failure modes back to model and rubric owners, and approving the final preference dataset that flows into training.

2024 - Present

Education

G

Government College, Mohali

Bachelor of Science, Computer Science

Bachelor of Science
2017 - 2021

Work History

T

Turing

Lead Software Engineer

N/A
2023 - Present
B

Biofourmis

Software Engineer

N/A
2022 - 2024