For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Sauhard Dubey

Sauhard Dubey

AI Trainer and LLM Evaluation Engineer

INDIA flag
Noida, India
$20.00/hrIntermediateOtherInternal Proprietary Tooling

Key Skills

Software

Other
Internal/Proprietary Tooling

Top Subject Matter

No subject matter listed

Top Data Types

DocumentDocument
TextText

Top Label Types

Computer Programming Coding
Entity Ner Classification
Evaluation Rating
Fine Tuning
Function Calling
Red Teaming
RLHF
Text Generation
Text Summarization

Freelancer Overview

AI Research Engineer and AI Trainer with hands-on experience in evaluating and annotating both natural language and AI-generated code outputs for LLM training. I have built and deployed agentic AI systems using Python, LangChain, LangGraph, PyTorch, and Hugging Face, and developed automated pipelines to process, clean, and validate structured data formats such as CSV, JSON, and Markdown. My work includes reviewing model-generated Python code, identifying logical and execution errors, performing structured evaluation and response rating, and improving dataset quality for training and fine-tuning large language models. I have implemented annotation workflows, prompt engineering strategies, and evaluation pipelines to improve model reasoning, code generation reliability, and output correctness.

IntermediateEnglishHindi

Labeling Experience

Autonomous Multi-Agent Reporting System Project

Internal Proprietary ToolingTextEvaluation Rating
Developed, evaluated, and annotated outputs from multi-agent LLMs for reporting workflows. Performed systematic validation and reasoning evaluation for enhanced factual correctness. Improved prompt structure and iteratively assessed AI response reliability. • Verified multi-agent response accuracy. • Evaluated reasoning in AI-generated content. • Applied iterative prompt and reliability improvements. • Targeted multi-agent AI reporting flows.

Developed, evaluated, and annotated outputs from multi-agent LLMs for reporting workflows. Performed systematic validation and reasoning evaluation for enhanced factual correctness. Improved prompt structure and iteratively assessed AI response reliability. • Verified multi-agent response accuracy. • Evaluated reasoning in AI-generated content. • Applied iterative prompt and reliability improvements. • Targeted multi-agent AI reporting flows.

2025

LLM Meeting Summarization System Project

Internal Proprietary ToolingTextEvaluation Rating
Annotated, evaluated, and optimized LLM-generated meeting summaries for accuracy, completeness, and clarity. Refined prompts and analyzed model responses to enhance quality. Leveraged Whisper, Transformers, and PyTorch for text data processing and evaluation. • Conducted quality analysis of AI-generated text summaries. • Applied prompt refinement techniques. • Used advanced NLP tools for evaluation. • Focused on meeting summarization outputs.

Annotated, evaluated, and optimized LLM-generated meeting summaries for accuracy, completeness, and clarity. Refined prompts and analyzed model responses to enhance quality. Leveraged Whisper, Transformers, and PyTorch for text data processing and evaluation. • Conducted quality analysis of AI-generated text summaries. • Applied prompt refinement techniques. • Used advanced NLP tools for evaluation. • Focused on meeting summarization outputs.

2025

AI Engineer and LLM Evaluation Developer, SciNets Remote

Internal Proprietary ToolingTextRLHFFine Tuning
Evaluated and annotated large language model outputs, including both natural language and AI-generated Python code, to improve factual accuracy, logical consistency, and execution correctness. Reviewed model-generated scripts, identified syntax and logic errors, and provided structured feedback to improve training data quality and model reliability. Processed and validated structured datasets including CSV, JSON, and Markdown using Python and Pandas to support LLM training pipelines. Built automated pipelines for data cleaning, transformation, and annotation to improve dataset quality. Designed structured evaluation workflows to classify responses, detect hallucinations, and improve reasoning performance. Applied prompt engineering and systematic response evaluation to enhance model performance, code generation quality, and reasoning accuracy.

Evaluated and annotated large language model outputs, including both natural language and AI-generated Python code, to improve factual accuracy, logical consistency, and execution correctness. Reviewed model-generated scripts, identified syntax and logic errors, and provided structured feedback to improve training data quality and model reliability. Processed and validated structured datasets including CSV, JSON, and Markdown using Python and Pandas to support LLM training pipelines. Built automated pipelines for data cleaning, transformation, and annotation to improve dataset quality. Designed structured evaluation workflows to classify responses, detect hallucinations, and improve reasoning performance. Applied prompt engineering and systematic response evaluation to enhance model performance, code generation quality, and reasoning accuracy.

2025

Education

J

Jaypee Institute of Information Technology

Bachelor of Technology, Electronics and Communication Engineering with Computer Science Specialization

Bachelor of Technology
2023 - 2027

Work History

S

SciNets

AI System Architect (LLM Evaluation and AI Training)

Noida
2025 - Present
R

Reportify

Backend Developer

Noida
2023 - 2023