Sauhard Dubey - AI Trainer and LLM Evaluation Engineer

Key Skills

Software

Other

Internal/Proprietary Tooling

Top Subject Matter

No subject matter listed

Top Data Types

Document

Text

Top Label Types

Computer Programming Coding

Entity Ner Classification

Evaluation Rating

Fine Tuning

Function Calling

Red Teaming

RLHF

Text Generation

Text Summarization

Freelancer Overview

AI Research Engineer and AI Trainer with hands-on experience in evaluating and annotating both natural language and AI-generated code outputs for LLM training. I have built and deployed agentic AI systems using Python, LangChain, LangGraph, PyTorch, and Hugging Face, and developed automated pipelines to process, clean, and validate structured data formats such as CSV, JSON, and Markdown. My work includes reviewing model-generated Python code, identifying logical and execution errors, performing structured evaluation and response rating, and improving dataset quality for training and fine-tuning large language models. I have implemented annotation workflows, prompt engineering strategies, and evaluation pipelines to improve model reasoning, code generation reliability, and output correctness.

IntermediateEnglishHindi

Labeling Experience

Autonomous Multi-Agent Reporting System Project

Internal Proprietary ToolingTextEvaluation Rating

Developed, evaluated, and annotated outputs from multi-agent LLMs for reporting workflows. Performed systematic validation and reasoning evaluation for enhanced factual correctness. Improved prompt structure and iteratively assessed AI response reliability. • Verified multi-agent response accuracy. • Evaluated reasoning in AI-generated content. • Applied iterative prompt and reliability improvements. • Targeted multi-agent AI reporting flows.

2025

LLM Meeting Summarization System Project

Internal Proprietary ToolingTextEvaluation Rating

Annotated, evaluated, and optimized LLM-generated meeting summaries for accuracy, completeness, and clarity. Refined prompts and analyzed model responses to enhance quality. Leveraged Whisper, Transformers, and PyTorch for text data processing and evaluation. • Conducted quality analysis of AI-generated text summaries. • Applied prompt refinement techniques. • Used advanced NLP tools for evaluation. • Focused on meeting summarization outputs.

2025

AI Engineer and LLM Evaluation Developer, SciNets Remote

Internal Proprietary ToolingTextRLHFFine Tuning

Evaluated and annotated large language model outputs, including both natural language and AI-generated Python code, to improve factual accuracy, logical consistency, and execution correctness. Reviewed model-generated scripts, identified syntax and logic errors, and provided structured feedback to improve training data quality and model reliability. Processed and validated structured datasets including CSV, JSON, and Markdown using Python and Pandas to support LLM training pipelines. Built automated pipelines for data cleaning, transformation, and annotation to improve dataset quality. Designed structured evaluation workflows to classify responses, detect hallucinations, and improve reasoning performance. Applied prompt engineering and systematic response evaluation to enhance model performance, code generation quality, and reasoning accuracy.

2025

Education

J

Jaypee Institute of Information Technology

Bachelor of Technology, Electronics and Communication Engineering with Computer Science Specialization

Bachelor of Technology

2023 - 2027

Work History

S

SciNets

AI System Architect (LLM Evaluation and AI Training)

Noida

2025 - Present

R

Reportify

Backend Developer

Noida

2023 - 2023