For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
S

Shaurya Sharthak

Data Engineer for Instruct Dataset Curation.

India flagChattisgarh, India
$7.00/hrIntermediateData Annotation TechScale AIMercor

Key Skills

Software

Data Annotation TechData Annotation Tech
Scale AIScale AI
MercorMercor

Top Subject Matter

NLP Domain Expertise
Red Teaming
AI Research Based

Top Data Types

TextText
DocumentDocument
Computer Code ProgrammingComputer Code Programming

Top Task Types

Data CollectionData Collection
Function CallingFunction Calling
Prompt + Response Writing (SFT)Prompt + Response Writing (SFT)
Red TeamingRed Teaming
Fine-tuningFine-tuning
RLHFRLHF
TranscriptionTranscription
Question AnsweringQuestion Answering
Text SummarizationText Summarization
Text GenerationText Generation
Object DetectionObject Detection
ClassificationClassification
Bounding BoxBounding Box

Freelancer Overview

Data Engineer for Instruct Dataset Curation (High-Quality Hindi & Hinglish Instruct Datasets Project). Brings 2+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Internal and Proprietary Tooling. Education includes Bachelor of Technology with Honours, Chhattisgarh Swami Vivekananda Technical University (2026) and Senior Secondary Certificate, Jawahar Navodaya Vidyalaya, Kurukshetra (2021). AI-training focus includes data types such as Text and labeling workflows including Data Collection.

IntermediateEnglishHindiBhojpuri

Labeling Experience

Data Engineer for Instruct Dataset Curation (High-Quality Hindi & Hinglish Instruct Datasets Project)

TextData Collection
Curated and engineered high-quality instructional datasets in Hindi and Hinglish for language model training. The work consisted of collecting, organizing, and vetting text data to ensure suitability for LLM fine-tuning and benchmarking. Contributed directly to the improvement of multilingual and low-resource language AI models. • Assembled a 30,000-entry Hindi instruct dataset curated for LLM tasks. • Developed HINGLISH-LIMA to address code-mixed language use cases. • Ensured instruction-following data was properly formatted and diversified. • Collaborated with engineering to validate data prior to integration.

Curated and engineered high-quality instructional datasets in Hindi and Hinglish for language model training. The work consisted of collecting, organizing, and vetting text data to ensure suitability for LLM fine-tuning and benchmarking. Contributed directly to the improvement of multilingual and low-resource language AI models. • Assembled a 30,000-entry Hindi instruct dataset curated for LLM tasks. • Developed HINGLISH-LIMA to address code-mixed language use cases. • Ensured instruction-following data was properly formatted and diversified. • Collaborated with engineering to validate data prior to integration.

2024 - 2025

Education

C

Chhattisgarh Swami Vivekananda Technical University

Bachelor of Technology with Honours, Computer Science and Engineering (Artificial Intelligence and Machine Learning)

Bachelor of Technology with Honours
2022 - 2026
J

Jawahar Navodaya Vidyalaya, Kurukshetra

Senior Secondary Certificate, Science

Senior Secondary Certificate
2020 - 2021

Work History

R

ResoluteAI Software

AI Engineer Intern

Bangalore
2024 - 2025