Suraj Sawane - AI Expert in training the Model

Key Skills

Software

Remotasks

Scale AI

Top Subject Matter

No subject matter listed

Top Data Types

Document

Image

Text

Top Label Types

Classification

Computer Programming Coding

Evaluation Rating

Freelancer Overview

With nearly nine years of industry experience—including 4.8 years as a Data Scientist and extensive work with LLMs, automation, and data engineering—I’ve contributed deeply to the creation, curation, and optimization of high-quality AI training data. I’ve worked hands-on with structured, semi-structured, and unstructured datasets, performing data mining, preprocessing, annotation planning, metadata extraction, and model-ready formatting across banking, healthcare, and automotive domains. My background includes evaluating prompt variations, embeddings, and fine-tuning strategies for LLMs, and conducting A/B experiments that improved response accuracy by 20%, giving me a strong understanding of how high-quality labeled data directly influences model behavior. I’ve built multiple AI-powered solutions—including chatbots, RAG systems, and automated documentation tools—that required precise data preparation, schema understanding, and consistent quality control. My experience with vector databases, embedding models, prompt engineering, and cloud platforms (AWS, IBM Watsonx) allows me to design and validate training datasets that enhance model comprehension and reliability. With a solid foundation in machine learning, data analysis, and LLM fine-tuning, I bring both technical depth and practical experience in developing accurate, scalable, and domain-aware training data pipelines.

ExpertEnglishMarathi

Labeling Experience

Data Scientist

Scale AIComputer Code ProgrammingRLHF

Worked part-time on an hourly freelance contract with Remotasks from January 2024 to June 2024 as a Data Scientist / Code Labeling Contributor. Contributed 50–70 hours per week to Python-focused code labeling, annotation, and peer review tasks, supporting the development and quality assurance of large-scale AI training datasets. Ensured accuracy, consistency, and adherence to project-specific coding and annotation guidelines.

2024 - 2024

Data Scientist

Scale AITextRLHF

I have hands-on experience working with Remotasks, where I contributed to training LLM models through high-quality data labeling and evaluation. My work involved analyzing prompts, refining model outputs, and ensuring the accuracy and consistency of training data used for model improvement.

2024 - 2024

Education

N

N/A

Bachelor of Engineering, Engineering

Bachelor of Engineering

2015 - 2015

Work History

V

Voltup

Data Scientist

Pune

2025 - Present

S

SlashCurate Technologies

Data Scientist

Pune

2024 - 2025