For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Siddharth Upadhyay

Siddharth Upadhyay

RLHF Expert AI Engineer

India flagDelhi, India
$30.00/hrEntry LevelLabelboxScale AI

Key Skills

Software

LabelboxLabelbox
Scale AIScale AI

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code ProgrammingComputer Code Programming
ImageImage
TextText

Top Task Types

Action Recognition
Computer Programming Coding
Evaluation Rating
RLHF
Segmentation

Freelancer Overview

As a Senior AI Engineer with a proven track record of developing and deploying advanced machine learning models, I have a deep, practical understanding of the critical role high-quality training data plays in the ML lifecycle. My experience spans building computer vision pipelines that process over 500K images daily, creating NLP-based document understanding systems, and deploying fraud detection models for major financial institutions. This technical foundation provides me with a unique perspective in data annotation. I have hands-on experience annotating diverse datasets for LLMs, consistently achieving over 98% accuracy. More importantly, I specialize in identifying subtle inconsistencies and providing actionable feedback to engineering teams to enhance labeling guidelines. This ability to bridge the gap between data annotation and model development ensures the creation of robust, reliable, and highly performant AI systems.

Entry LevelHindiFrenchEnglishJapanese

Labeling Experience

Labelbox

AI Agent Response Factuality & Grounding Analysis

LabelboxTextClassificationRLHF
This project focused on the critical evaluation of AI agent responses to ensure factual accuracy. For each task, I performed sentence-level analysis, labeling every sentence from the agent's output as 'supported', 'unsupported', 'contradictory', 'disputed', or 'no_rad' based on a strict source-of-truth context. The core of my work involved not only assigning a label but also writing a detailed rationale for each decision and meticulously extracting the exact text or code excerpts from the context that proved or disproved the agent's statement. This required a deep analysis of the agent's reasoning, including verifying the logic of its tool-use code (e.g., Python API calls) and the validity of tool outputs before they could be used as trusted evidence for subsequent sentences.

This project focused on the critical evaluation of AI agent responses to ensure factual accuracy. For each task, I performed sentence-level analysis, labeling every sentence from the agent's output as 'supported', 'unsupported', 'contradictory', 'disputed', or 'no_rad' based on a strict source-of-truth context. The core of my work involved not only assigning a label but also writing a detailed rationale for each decision and meticulously extracting the exact text or code excerpts from the context that proved or disproved the agent's statement. This required a deep analysis of the agent's reasoning, including verifying the logic of its tool-use code (e.g., Python API calls) and the validity of tool outputs before they could be used as trusted evidence for subsequent sentences.

2025 - 2025
Labelbox

Advanced Code Generation Verification & Test Creation

LabelboxComputer Code ProgrammingRLHFFine Tuning
The project involved creating a benchmark dataset of challenging programming problems to test and verify the capabilities of advanced code generation LLMs. My role was to generate complete data rows, each consisting of a complex prompt, a canonical solution, and a comprehensive test suite. I authored graduate-level coding prompts in subjects like Machine Learning and MLOps, ensuring they were unambiguous and required multi-step logical reasoning. For each prompt, I developed a production-quality, executable solution. A key task was creating extensive test suites (10-20+ test cases per problem) using standard frameworks to cover all requirements and edge cases. To ensure a high bar for quality, I validated the difficulty of each problem by evaluating responses from multiple advanced LLMs, categorizing them as 'Hard' or 'Expert' based on the models' failures.

The project involved creating a benchmark dataset of challenging programming problems to test and verify the capabilities of advanced code generation LLMs. My role was to generate complete data rows, each consisting of a complex prompt, a canonical solution, and a comprehensive test suite. I authored graduate-level coding prompts in subjects like Machine Learning and MLOps, ensuring they were unambiguous and required multi-step logical reasoning. For each prompt, I developed a production-quality, executable solution. A key task was creating extensive test suites (10-20+ test cases per problem) using standard frameworks to cover all requirements and edge cases. To ensure a high bar for quality, I validated the difficulty of each problem by evaluating responses from multiple advanced LLMs, categorizing them as 'Hard' or 'Expert' based on the models' failures.

2025 - 2025

Education

G

Guru Gobind Singh Indraprastha University

Bachelor Of Technology, Artificial Intelligence And Data Science

Bachelor Of Technology
2022

Work History

I

Intellect Design Arena

Senior AI Engineer

Chennai
2023 - Present
Z

Zoho Corporation

Machine Learning Engineer

Chennai
2022 - 2023