For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
R

Romit Dubey

Data Processing and Text Classification for Digitization (IIT Bombay Internship)

India flagMumbai, India
$20.00/hrIntermediateLabelimgRoboflowSuperannotate

Key Skills

Software

LabelImgLabelImg
RoboflowRoboflow
SuperAnnotateSuperAnnotate

Top Subject Matter

Historical Manuscript Digitization and Information Retrieval

Top Data Types

DocumentDocument
TextText
ImageImage

Top Task Types

Classification
Bounding Box
Object Detection
Computer Programming Coding

Freelancer Overview

Data Processing and Text Classification for Digitization (IIT Bombay Internship). Brings 2+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Internal and Proprietary Tooling. Education includes Master of Computer Application, Banaras Hindu University (2024). AI-training focus includes data types such as Document and labeling workflows including Classification.

IntermediateEnglish

Labeling Experience

Data Processing and Text Classification for Digitization (IIT Bombay Internship)

DocumentClassification
Built and automated OCR workflows to extract, clean, and structure text from large volumes of scanned historical manuscripts. Applied data preprocessing and normalization to create datasets suitable for downstream AI applications. Validated and processed extracted content to ensure high accuracy for research purposes. • Created machine-readable datasets from noisy document sources • Automated pipeline for text digitization using Python-based tools • Performed quality checks on labeled data to meet research standards • Enabled search and retrieval on digitized content

Built and automated OCR workflows to extract, clean, and structure text from large volumes of scanned historical manuscripts. Applied data preprocessing and normalization to create datasets suitable for downstream AI applications. Validated and processed extracted content to ensure high accuracy for research purposes. • Created machine-readable datasets from noisy document sources • Automated pipeline for text digitization using Python-based tools • Performed quality checks on labeled data to meet research standards • Enabled search and retrieval on digitized content

2022 - 2023

Education

B

Banaras Hindu University

Master of Computer Application, Computer Application

Master of Computer Application
2024

Work History

I

Indian Institute of Technology

Software Development Intern

Mumbai
2022 - 2023