For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Abhigyan

Abhigyan

Research Intern – Multimodal Stateful Dataset Annotation

INDIA flag
Kolkata, India
$80.00/hrIntermediate

Key Skills

Software

No software listed

Top Subject Matter

Crime data
Multimodal datasets
Text annotation

Top Data Types

TextText
ImageImage

Top Task Types

Data Collection
Classification
RLHF
Fine Tuning
Entity Ner Classification
Text Generation
Question Answering
Text Summarization
Computer Programming Coding
Prompt Response Writing SFT

Freelancer Overview

Research Intern – Multimodal Stateful Dataset Annotation. Brings 4+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Internal and Proprietary Tooling. Education includes Doctor of Philosophy, Indian Institute of Technology Roorkee (2023) and Bachelor of Technology, Dr. Ambedkar Institute of Technology for Handicapped Kanpur, Uttar Pradesh (Affiliated with Dr. APJ Abdul Kalam Technical University) (2019). AI-training focus includes data types such as Text and labeling workflows including Data Collection.

IntermediateEnglish

Labeling Experience

PhD Research Scholar – Dataset Curation and Annotation

TextData Collection
As a PhD Research Scholar at IIT Roorkee, I built and curated datasets using the Stack Exchange API and web scraping for AI model training. These datasets were utilized in classification and complexity measurement tasks with large language models like BERT, Longformer, GPT-2, and Llama. I was responsible for the end-to-end process of dataset collection and annotation to facilitate high-quality machine learning research. • Designed datasets from raw scraped data targeting specific research objectives. • Annotated and validated text datasets for use in fine-tuning and evaluation tasks. • Implemented and tested complexity measures on labeled data. • Managed large-scale experimental workflows on the PARAM Ganga supercomputer.

As a PhD Research Scholar at IIT Roorkee, I built and curated datasets using the Stack Exchange API and web scraping for AI model training. These datasets were utilized in classification and complexity measurement tasks with large language models like BERT, Longformer, GPT-2, and Llama. I was responsible for the end-to-end process of dataset collection and annotation to facilitate high-quality machine learning research. • Designed datasets from raw scraped data targeting specific research objectives. • Annotated and validated text datasets for use in fine-tuning and evaluation tasks. • Implemented and tested complexity measures on labeled data. • Managed large-scale experimental workflows on the PARAM Ganga supercomputer.

2023 - Present

Research Intern – Multimodal Stateful Dataset Annotation

TextData Collection
I participated in the creation of a multimodal stateful dataset focused on crime-related data at TCS Research, Bengaluru. My work involved building custom data scraping tools and implementing a labeling pipeline using various AI tools. I specifically contributed to modeling source and attributes for structured annotation of text data. • Developed and ran a Firecrawl-based scraper locally for data collection. • Used a pipeline integrating CrewAI, MCP, CursorAI, and Firecrawl for annotation. • Focused on text and attribute annotation related to crime data. • Contributed to foundational research on multimodal dataset labeling.

I participated in the creation of a multimodal stateful dataset focused on crime-related data at TCS Research, Bengaluru. My work involved building custom data scraping tools and implementing a labeling pipeline using various AI tools. I specifically contributed to modeling source and attributes for structured annotation of text data. • Developed and ran a Firecrawl-based scraper locally for data collection. • Used a pipeline integrating CrewAI, MCP, CursorAI, and Firecrawl for annotation. • Focused on text and attribute annotation related to crime data. • Contributed to foundational research on multimodal dataset labeling.

2025 - 2025

Education

D

Dr. Ambedkar Institute of Technology for Handicapped Kanpur, Uttar Pradesh (Affiliated with Dr. APJ Abdul Kalam Technical University)

Bachelor of Technology, Information Technology

Bachelor of Technology
2015 - 2019
K

Kendriya Vidyalaya

High School Diploma, General Education

High School Diploma
2013 - 2015

Work History

T

TCS

Systems Engineer

Kolkata
2021 - 2023
H

Hal Transport Aircraft Division

Graduate Apprentice

Kanpur
2020 - 2021