For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
V

Vidushi Raval

Data Science Fellow—RAG Dataset Preparation & EDA

USA flagRemote, Usa
$40.00/hrIntermediate

Key Skills

Software

No software listed

Top Subject Matter

Machine Learning
Natural Language Processing
Scientific Metadata

Top Data Types

TextText
Computer Code ProgrammingComputer Code Programming

Top Task Types

Data CollectionData Collection
RLHFRLHF
Evaluation/RatingEvaluation/Rating
Red TeamingRed Teaming
Computer Programming/CodingComputer Programming/Coding
Prompt + Response Writing (SFT)Prompt + Response Writing (SFT)

Freelancer Overview

Data Science Fellow—RAG Dataset Preparation & EDA. Brings 13+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Internal and Proprietary Tooling. Education includes Master of Science, Southern New Hampshire University (2023) and Bachelor of Engineering, Atmiya University (2019). AI-training focus includes data types such as Text and labeling workflows including Data Collection.

IntermediateEnglishHindi

Labeling Experience

Data Science Fellow—RAG Dataset Preparation & EDA

TextData Collection
As a Data Science Fellow, performed EDA and data cleaning on arXiv metadata to prepare a high-quality corpus for an NLP retrieval-augmented generation (RAG) pipeline. Engineered NLP features and created a documented data schema, supporting scalable chunking and embedding for LLM training. Executed ETL workflows and feature engineering pipelines aimed at enhancing the accuracy and utility of AI models. • Prepared and curated text data for LLM-focused RAG datasets • Implemented NLP feature engineering for downstream AI tasks • Created data schemas to ensure scalability and ingestibility • Conducted rigorous quality checks to validate annotation and preparation steps

As a Data Science Fellow, performed EDA and data cleaning on arXiv metadata to prepare a high-quality corpus for an NLP retrieval-augmented generation (RAG) pipeline. Engineered NLP features and created a documented data schema, supporting scalable chunking and embedding for LLM training. Executed ETL workflows and feature engineering pipelines aimed at enhancing the accuracy and utility of AI models. • Prepared and curated text data for LLM-focused RAG datasets • Implemented NLP feature engineering for downstream AI tasks • Created data schemas to ensure scalability and ingestibility • Conducted rigorous quality checks to validate annotation and preparation steps

2024 - Present

Education

S

Southern New Hampshire University

Master of Science, Information Technology

Master of Science
2023 - 2023
A

Atmiya University

Bachelor of Engineering, Information Technology

Bachelor of Engineering
2019 - 2019

Work History

S

Springboard

Data Science Fellow

Remote
2024 - Present
E

Ellen Michaels Presents

Web Developer

San Francisco
2016 - 2017