For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Aryan Dadwal

Aryan Dadwal

LLM Evaluation Expert in English, Hindi, Punjabi and French

India flagJalandhar, India
$5.00/hrExpertAws SagemakerAppenCrowdflower

Key Skills

Software

AWS SageMakerAWS SageMaker
AppenAppen
CrowdFlowerCrowdFlower
CrowdSourceCrowdSource
LabelboxLabelbox
RemotasksRemotasks
Scale AIScale AI
Surge AISurge AI
Internal/Proprietary Tooling
OneFormaOneForma

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code ProgrammingComputer Code Programming
DocumentDocument
Medical DicomMedical Dicom

Top Task Types

Computer Programming Coding
Evaluation Rating
Fine Tuning
Prompt Response Writing SFT
Translation Localization

Freelancer Overview

With hands-on experience in both Reinforcement Learning with Human Feedback (RLHF) and large-scale search quality evaluation, I bring a proven track record in AI training data operations. I’ve contributed to the alignment of Hindi language LLMs through prompt evaluation and model fine-tuning with Outlier AI, helping refine generative outputs for safety, coherence, and factual correctness. Additionally, at RWS TrainAI, I worked on geographically-aware search quality assessments—rating search results for intent match, user satisfaction, and contextual relevance. My skill set spans across diverse labeling types including intent classification, prompt ranking, query understanding, sentiment tagging, and named entity recognition. I’m comfortable working with both structured guidelines and open-ended feedback tasks, ensuring high-quality contributions even in subjective or nuanced evaluation tasks. Equipped with strong attention to detail, multilingual capabilities (Hindi, English, French), and a passion for improving AI systems, I’m well-prepared to support a wide range of projects across NLP, search, and LLM training domains.

ExpertHindiFrenchEnglishPunjabi

Labeling Experience

Labelbox

E-commerce Image Tagging for Visual AI

LabelboxImageBounding BoxSegmentation
Labeled 10,000+ product images for an AI-based visual recommendation engine. Tasks involved drawing bounding boxes around clothing items (shirts, shoes, accessories), labeling style categories, and segmenting background elements. Contributed to improving the visual search pipeline and product discovery on mobile apps. Maintained quality score >95% through consensus-based QA checks.

Labeled 10,000+ product images for an AI-based visual recommendation engine. Tasks involved drawing bounding boxes around clothing items (shirts, shoes, accessories), labeling style categories, and segmenting background elements. Contributed to improving the visual search pipeline and product discovery on mobile apps. Maintained quality score >95% through consensus-based QA checks.

2024 - 2024
Scale AI

Hindi LLM Prompt Evaluation (RLHF) – Outlier AI

Scale AITextTranslation LocalizationRLHF
Evaluated Hindi language prompt completions by LLMs, rating responses on factual accuracy, coherence, safety, and alignment with human intent. Helped train the model via reinforcement learning with human feedback (RLHF) and supervised fine-tuning (SFT). Tasks involved side-by-side comparisons, instruction tuning, and adversarial response testing. Ensured high annotation quality with strict adherence to token-level accuracy and subjective alignment metrics.

Evaluated Hindi language prompt completions by LLMs, rating responses on factual accuracy, coherence, safety, and alignment with human intent. Helped train the model via reinforcement learning with human feedback (RLHF) and supervised fine-tuning (SFT). Tasks involved side-by-side comparisons, instruction tuning, and adversarial response testing. Ensured high annotation quality with strict adherence to token-level accuracy and subjective alignment metrics.

2024 - 2024

earch Quality Evaluation – RWS TrainAI

Internal Proprietary ToolingTextClassificationGeocoding
Assessed the quality and relevance of search engine results based on user intent, query complexity, and location context. Conducted multiple ratings per query, including "Needs Met," content quality, and geo-specific accuracy. Handled hundreds of queries daily with consistent calibration scores above 90%.

Assessed the quality and relevance of search engine results based on user intent, query complexity, and location context. Conducted multiple ratings per query, including "Needs Met," content quality, and geo-specific accuracy. Handled hundreds of queries daily with consistent calibration scores above 90%.

2023 - 2024

Invisible AI

Internal Proprietary ToolingComputer Code ProgrammingComputer Programming Coding
Trained a coding model

Trained a coding model

2021 - 2024
OneForma

Audio to Text Conversion

OneformaAudioText GenerationTranslation Localization
I converted 500 Hindi and English audio files to text form.

I converted 500 Hindi and English audio files to text form.

2022 - 2023

Education

I

Indian Institute of Technology Jodhpur

Bachelor of Science, Applied Artificial Intelligence And Data Science

Bachelor of Science
2020 - 2024

Work History

R

RWS – Project Callisto

Google Search Quality Rater

Remote
2025 - 2025
J

Jyovis Healthcare Solutions Pvt. Ltd.

AI Intern

On-site
2025 - 2025