Harsh Kumar - Machine Learning Engineer - Computer Vision & AI

Key Skills

Software

Mindrift

Scale AI

Other

Appen

Top Subject Matter

No subject matter listed

Top Data Types

Audio

Computer Code Programming

Document

Image

Text

Video

Top Label Types

Action Recognition

Audio Recording

Bounding Box

Classification

Computer Programming Coding

Data Collection

Evaluation Rating

Fine Tuning

Prompt Response Writing SFT

Question Answering

Red Teaming

RLHF

Text Generation

Transcription

Translation Localization

Freelancer Overview

I am passionate about building high-quality AI systems and have hands-on experience with data labeling, annotation, and training data pipelines across computer vision and NLP domains. I have worked on projects involving PII classification using BERT, rare species recognition with few-shot learning, and real-time facial recognition, where I handled large, diverse datasets and implemented robust annotation strategies to improve model accuracy and reliability. My technical skills include Python, PyTorch, TensorFlow, OpenCV, and HuggingFace Transformers, and I am adept at designing evaluation frameworks and optimizing data flows for both supervised and unsupervised learning tasks. I thrive in collaborative environments and am committed to delivering precise, well-labeled datasets that drive the success of AI models in real-world applications.

IntermediateEnglishHindiPunjabiUrdu

Labeling Experience

LLM Alignment & Response Quality Annotation – Alignerr

OtherTextText GenerationRLHF

Contributed to alignment and evaluation tasks aimed at improving large language model safety, reasoning, and instruction-following capabilities. Responsibilities included evaluating model-generated responses for accuracy, coherence, and policy compliance. Performed pairwise ranking of responses, corrected flawed outputs, and generated improved prompt-response pairs used for supervised fine-tuning. Participated in adversarial testing and red-teaming tasks to identify potential model weaknesses and safety issues. Additional Information Worked on complex prompts requiring analytical reasoning, structured explanations, and domain-specific knowledge in technical and analytical subjects.

2025

AI Model Evaluation & RLHF Annotation – Outlier AI

Scale AIComputer Code ProgrammingRLHFFine Tuning

Worked as an AI evaluator contributing to reinforcement learning from human feedback (RLHF) pipelines to improve the reasoning and coding abilities of large language models. Tasks included reviewing prompt-response pairs, ranking multiple model outputs, identifying logical errors, and providing improved responses aligned with task instructions. Evaluated programming solutions across languages such as Python, JavaScript, and C++, focusing on correctness, efficiency, and adherence to problem constraints. Maintained strict annotation standards and followed detailed rubrics to ensure high-quality training data for model optimization.

2024

AI Code Annotation & Evaluation – Mindrift AI

MindriftComputer Code ProgrammingRLHFFine Tuning

Worked as an AI data annotator and evaluator on programming-related datasets to improve large language model performance. Tasks involved reviewing and labeling code generation outputs, evaluating correctness, efficiency, and instruction adherence across multiple programming languages such as Python, C++, and JavaScript. Contributed to RLHF and supervised fine-tuning workflows by ranking model responses, identifying logical or syntactic errors, and providing improved prompt-response pairs. Maintained strict quality standards by following annotation guidelines, ensuring consistency, and documenting edge cases. The project involved large-scale dataset evaluation and iterative feedback loops to enhance model reliability for coding tasks.

2024

Speech Data Transcription & Audio Annotation – Appen

AppenAudioClassificationTranslation Localization

Worked on large-scale speech recognition dataset preparation by performing detailed audio transcription and linguistic annotation. Tasks involved transcribing diverse audio samples including conversational speech, interviews, and multi-speaker recordings while maintaining strict adherence to transcription guidelines. Handled complex audio scenarios such as background noise, overlapping speech, filler words, and accented speech. Ensured accurate punctuation, speaker identification, and timestamp alignment to create high-quality datasets used for training automatic speech recognition (ASR) models. Maintained consistency and accuracy through quality review checks and guideline compliance.

2023

Education

I

Indian Institute of Technology (Indian School of Mines) Dhanbad

Integrated Master of Technology, Mathematics and Computing

Integrated Master of Technology

2021 - 2025

Work History

I

IIT Dhanbad

Research Intern

Dhanbad

2025 - 2025

Y

YM

Marketing and Growth Advisor

Dhanbad

2024 - 2025