Dheeraj Shenoy - AI Training & Data Labeling Expert | LLM Evaluator | NLP & Vision Tasks

Key Skills

Software

Clickworker

CloudFactory

CrowdFlower

CrowdSource

CVAT

Data Annotation Tech

Dataloop

Datumbox

Datasaur

Datature

LabelImg

Label Studio

Prodigy

Redbrick AI

Remotasks

V7 Labs

Internal/Proprietary Tooling

Scale AI

Appen

Top Subject Matter

No subject matter listed

Top Data Types

Audio

Geospatial Tiled Imagery

Video

Top Task Types

Action Recognition

Audio Recording

Bounding Box

Computer Programming Coding

Evaluation Rating

Freelancer Overview

I am a detail-oriented AI Training and Data Labeling Specialist with hands-on experience working on natural language processing (NLP), computer vision, and LLM evaluation tasks. My background includes contributing to AI projects involving text classification, sentiment analysis, image annotation, and prompt evaluation for large language models. I’ve worked with platforms like Scale.ai, Remotasks, and UHRS to help train, refine, and align cutting-edge AI systems with human judgment and real-world needs. What sets me apart is my ability to consistently deliver high-accuracy results while understanding the context and ethical implications of AI outputs. I'm comfortable working across domains including e-commerce, healthcare, and generative AI, and I bring a strong sense of responsibility, quality control, and adaptability to every task. Whether the goal is better chatbot performance, cleaner data, or more reliable AI outputs, I ensure my contributions lead to smarter and safer models.

ExpertUrduHindiTamilEnglishMarathiSpanish

Labeling Experience

Tamil Video & Audio Transcription Quality Reviewer for AI Dataset

Scale AIVideoClassificationText Generation

Reviewed and annotated Tamil video and audio content for transcription accuracy and quality control. The task involved analyzing AI-generated transcripts for Tamil speech, flagging errors with precise timestamps, and categorizing issues as major, minor, or no error. I also provided written feedback in Tamil and English to help improve speech-to-text models used in virtual assistants and subtitling engines. My role included spotting nuances such as regional accents, slang, contextual errors, and missing words — all while ensuring grammar and meaning were preserved. This helped train high-quality voice recognition and video understanding systems for Tamil-speaking users. I consistently received high ratings for quality and attention to detail.

2023 - 2024

Freelance Linguistic QA Annotator – Marathi

CrowdflowerAudioEntity Ner ClassificationClassification

Reviewed and corrected transcripts of Marathi audio recordings used for training speech recognition models. Tasks included listening to native Marathi speech clips, evaluating the accuracy of transcriptions, and marking errors as major, minor, or no error using detailed criteria. I added timestamps for each error, classified the nature of the issue (e.g., misheard words, grammar, punctuation), and ensured formatting matched the style guide. This helped improve AI models for Marathi voice recognition and customer support automation. Worked on transcription and QA tasks in Marathi, Hindi, and Tamil for platforms like Appen and UHRS. Evaluated audio quality, provided timestamped error reports, and followed strict formatting rules. Helped enhance AI models for regional voice assistants and speech-to-text systems in multiple Indian languages.

2022 - 2024

Hindi Audio Transcription & Cleanup for Speech Recognition Training

AppenAudioClassificationText Generation

Worked on transcription of short Hindi audio clips as part of a dataset used to train speech recognition models. The task involved listening to native Hindi speech (across various dialects) and converting it into clean, standardized text. I removed disfluencies (like fillers and stutters), corrected basic grammatical errors while preserving speaker intent, and applied formatting guidelines for clarity and consistency. Also tagged audio with timestamps and evaluated the quality of AI-generated transcriptions for error classification and feedback. This helped improve the model’s accuracy in real-world use cases like voice assistants and call center automation.

2022 - 2024

E-commerce Product Data Classification and Annotation for NLP Model Training

CVATTextPolygonPolyline

In this project, I was responsible for classifying thousands of product listings into predefined categories and subcategories based on titles, descriptions, and user reviews. The work involved cleaning noisy text data, tagging named entities like brand, size, and color, and summarizing product descriptions to help train a recommendation engine and search relevance model. I also evaluated and rated AI-generated responses (e.g., suggested tags and descriptions) to improve prompt accuracy and model understanding. The work required high attention to detail, strict adherence to quality guidelines, and strong contextual understanding of e-commerce language. I consistently maintained top-tier accuracy scores and helped improve the model's ability to understand buyer intent and product features.

2022 - 2024

Education

Y

Yuva Shakthi Education Society And Millennium Software Solutions

Certificate, Hardware & Networking

Certificate

2014 - 2018

Work History

I

iMerit

Pod Lead in Data Annotation

Bangalore

2022 - Present

I

iMerit

Pod Lead in Data Annotation

Bangalore

2022 - Present