For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
S

Sathvik Verma

AI Referral Automation Project Lead

USA flagMilpitas, Usa
$20.00/hrIntermediateData Annotation TechDeep SystemsMercor

Key Skills

Software

Data Annotation TechData Annotation Tech
Deep SystemsDeep Systems
MercorMercor
OneFormaOneForma
Surge AISurge AI
DataturkDataturk
SuperAnnotateSuperAnnotate

Top Subject Matter

Healthcare Data & Medical Documentation
Medical Referrals
NLP Automation

Top Data Types

TextText
AudioAudio
Computer Code ProgrammingComputer Code Programming

Top Task Types

ClassificationClassification
Object DetectionObject Detection
Text SummarizationText Summarization
Question AnsweringQuestion Answering
RLHFRLHF
Data CollectionData Collection
Computer Programming/CodingComputer Programming/Coding
Prompt + Response Writing (SFT)Prompt + Response Writing (SFT)

Freelancer Overview

I have experience of AI training data and data labeling both through work in an academic setting as well as through my professional career. When working at Amruta Inc. I created a natural language processing pipeline that required a lot of data annotation, the design of a classification schema to categorize the data so that the system was able to function properly, many iterations of labeling unstructured data from the healthcare industry to train and validate the model’s ability to route information correctly. I have also utilized both Label Studio and Scale AI to assist with structured annotation workflows. I am a unique data labeler because I see data labeling as both a mechanical task but also as an important quality control function and process within the life cycle of developing an AI model. I am a Virginia Tech student with a 4.0 GPA in Computational Modeling and Data Analytics and have a strong analytical reasoning skills, attention to detail, and the ability to evaluate the output of AI models, evaluating for quality, tone, quality of reasoning, and how well the instruction was followed — all skills that I have developed both through formal education focusing on technical skills as well as hands-on experience deploying AI in the real world. I have completed over 200 hours performing annotation and evaluation on text, code, and multimodal datasets, and I am very comfortable working with vague guidelines and high-quality benchmarks.

IntermediateEnglishHindi

Labeling Experience

Reinforcement Learning from Human Feedback: LLM Preference Annotation

TextRLHF
My activity has included carrying out preference-based annotation jobs to support creating an RLHF pipeline to align a large language model with human expectations. My job involved comparing two AI-generated responses and choosing or ranking the better one based on helpfulness, harmlessness, and honesty criteria. Additionally, I provided elaborate justification for my preferred choice to give additional signal beyond just simple binary ratings, noted where neither of the two provided responses met quality standards and created reference responses to help when the model's outputs did not meet the standards set for helping with my task. I have developed a strong understanding of the RLHF-related failure modes of alignment, including sycophantic behaviour, over-refusal, and factually inaccurate confabulation across varying prompt types.

My activity has included carrying out preference-based annotation jobs to support creating an RLHF pipeline to align a large language model with human expectations. My job involved comparing two AI-generated responses and choosing or ranking the better one based on helpfulness, harmlessness, and honesty criteria. Additionally, I provided elaborate justification for my preferred choice to give additional signal beyond just simple binary ratings, noted where neither of the two provided responses met quality standards and created reference responses to help when the model's outputs did not meet the standards set for helping with my task. I have developed a strong understanding of the RLHF-related failure modes of alignment, including sycophantic behaviour, over-refusal, and factually inaccurate confabulation across varying prompt types.

2025 - Present

AI Response Quality Evaluation: Instruction Following & Reasoning Assessment

TextEvaluation Rating
Evaluated and rated AI-generated outputs systematically and across a broad range of prompt types including analytical reasoning, technical explanation, creative writing and factual response. Used structured rubrics to evaluate the response on the following dimensions: accuracy, coherence, instruction adherence, tone appropriateness and answer completeness. Identified hallucinations, logical inconsistencies and factual inaccuracies in the model output and provided detailed written justifications for rating to support downstream model improvements. Maintained high levels of inter-rater reliability adhering to strict annotation guidelines across hundreds of evaluations, of varying difficulty and subject domains.

Evaluated and rated AI-generated outputs systematically and across a broad range of prompt types including analytical reasoning, technical explanation, creative writing and factual response. Used structured rubrics to evaluate the response on the following dimensions: accuracy, coherence, instruction adherence, tone appropriateness and answer completeness. Identified hallucinations, logical inconsistencies and factual inaccuracies in the model output and provided detailed written justifications for rating to support downstream model improvements. Maintained high levels of inter-rater reliability adhering to strict annotation guidelines across hundreds of evaluations, of varying difficulty and subject domains.

2025 - Present

AI Response Evaluation & Text Annotation — LLM Training Data

TextEvaluation Rating
As part of an independent initiative to evaluate and annotate AI-generated text outputs within the context of LLM (large language model) training data collections, systematically evaluate, categorize, rank (by accuracy), assess (for relevance, quality of instruction-following) and describe model response outputs; flag hallucinations and factual inaccuracies; and create high-quality reference outputs for comparison. Use structured rubrics to assess tone, clarity of reasoning, and completeness of response (based on response length) related to various types of prompts (such as programming/computational code / solving math problems / writing or creative writing / general knowledge) resulting in consistently high-quality annotations across hundreds of evaluations while maintaining adherence to specific guidelines and benchmarks established for evaluation consistency. Implement effective analytical and writing skills developed as a result of completing a Computer Science/DATA Analytics curriculum with a 4.0 GPA (Virginia Tech).

As part of an independent initiative to evaluate and annotate AI-generated text outputs within the context of LLM (large language model) training data collections, systematically evaluate, categorize, rank (by accuracy), assess (for relevance, quality of instruction-following) and describe model response outputs; flag hallucinations and factual inaccuracies; and create high-quality reference outputs for comparison. Use structured rubrics to assess tone, clarity of reasoning, and completeness of response (based on response length) related to various types of prompts (such as programming/computational code / solving math problems / writing or creative writing / general knowledge) resulting in consistently high-quality annotations across hundreds of evaluations while maintaining adherence to specific guidelines and benchmarks established for evaluation consistency. Implement effective analytical and writing skills developed as a result of completing a Computer Science/DATA Analytics curriculum with a 4.0 GPA (Virginia Tech).

2024 - Present

Python & Java Code Generation and Review — AI Coding Assistant Training

TextComputer Programming Coding
I provided AI coding assistant training data by reviewing/creating a range of Python and Java code samples to evaluate correctness, efficiency and adherence to best practices. My deliverables consisted of generating clean, well-documented algorithm/data structure code solutions, creating bug reports for AI-generated code, and ranking the quality and readability of competing code completions. The software engineering practices I followed include modular design; having an understanding of how time complexity impacts the performance of my tasks; and effectively using various data structures when determining how to evaluate and create high-quality labeled coding examples. I produced high-quality coding examples related to beginner and intermediate tasks across a number of data manipulation (or attributes), object-oriented programming, and algorithm design domains while adhering to the same standards.

I provided AI coding assistant training data by reviewing/creating a range of Python and Java code samples to evaluate correctness, efficiency and adherence to best practices. My deliverables consisted of generating clean, well-documented algorithm/data structure code solutions, creating bug reports for AI-generated code, and ranking the quality and readability of competing code completions. The software engineering practices I followed include modular design; having an understanding of how time complexity impacts the performance of my tasks; and effectively using various data structures when determining how to evaluate and create high-quality labeled coding examples. I produced high-quality coding examples related to beginner and intermediate tasks across a number of data manipulation (or attributes), object-oriented programming, and algorithm design domains while adhering to the same standards.

2024 - 2026

AI Referral Automation Project Lead

TextClassification
The main responsibility was designing and deploying a data pipeline that used NLP to parse and automatically classify unstructured healthcare referral documents. My work involved building an automated system to identify relevant information and direct documents for further action, thus replacing previously manual, error-prone workflows. The system improved accuracy and efficiency in handling sensitive medical text records. • Developed multi-stage NLP pipelines for automated text classification. • Applied entity recognition and document routing for healthcare referrals. • Leveraged Python and NLP libraries for production deployment. • Enabled higher workflow efficiency and data veracity for stakeholders.

The main responsibility was designing and deploying a data pipeline that used NLP to parse and automatically classify unstructured healthcare referral documents. My work involved building an automated system to identify relevant information and direct documents for further action, thus replacing previously manual, error-prone workflows. The system improved accuracy and efficiency in handling sensitive medical text records. • Developed multi-stage NLP pipelines for automated text classification. • Applied entity recognition and document routing for healthcare referrals. • Leveraged Python and NLP libraries for production deployment. • Enabled higher workflow efficiency and data veracity for stakeholders.

2024 - 2024

Education

V

Virginia Tech

Bachelor of Science, Financial Technology and Big Data Analytics

Bachelor of Science
2024

Work History

A

Amruta Inc.

Student Mentor – Programming Fundamentals

Reston
2024 - 2024
A

Amruta Inc.

Project Lead – AI Referral Automation

Reston
2024 - 2024