Dhruv Kimothi - AI Training Data Specialist | Text Annotation, NLP & ML Projects

Key Skills

Software

Data Annotation Tech

Top Subject Matter

Cybersecurity (Phishing & Malware Detection)

Data Analytics / Business Intelligence

Artificial Intelligence & Machine Learning

Top Data Types

Text

Top Task Types

Classification

Text Generation

Fine Tuning

Computer Programming Coding

Freelancer Overview

I have practical experience contributing to AI training data workflows through hands-on work in data preprocessing, annotation understanding, and model evaluation across multiple machine learning and NLP projects. During my projects like phishing detection using DistilBERT and malware classification using N-gram analysis, I worked closely with structured and unstructured datasets—cleaning, labeling patterns, and ensuring data consistency to improve model accuracy. My internship experience also involved processing and refining over 50,000 data points using SQL and Excel, which strengthened my ability to handle large-scale datasets with precision and attention to detail . What sets me apart is my strong combination of technical and analytical skills, including Python (Pandas, NumPy, Scikit-learn), NLP model fine-tuning, and performance evaluation using metrics and error analysis. I have a deep understanding of how high-quality labeled data directly impacts model performance, and I am comfortable working with guidelines to ensure consistency and accuracy. Additionally, my experience building real-world AI systems (like an AI advisory system using RAG and an explainable malware detection API) gives me an edge in understanding both the data and model sides of AI training, making me highly effective in data labeling and validation tasks.

IntermediateEnglish

Labeling Experience

Malware Classification & Opcode Data Annotation

TextClassification

Worked on annotating and structuring opcode-based datasets for malware classification. Assigned labels to different malware families and validated dataset consistency to ensure accurate model training. Performed preprocessing on raw opcode sequences and removed noisy or redundant patterns. Applied N-gram feature extraction techniques and trained machine learning models for classification. Used Explainable AI (SHAP) to analyze predictions and identify potential labeling errors or inconsistencies. This project strengthened my ability to handle complex structured datasets and ensure high-quality annotations for multi-class problems.

2026 - 2026

AI Data Annotation & NLP Model Training (Phishing Detection)

TextClassification

Worked on building and improving a phishing detection system using labeled text data (phishing vs. legitimate messages). Performed data cleaning, preprocessing, and annotation validation to ensure high-quality training data. Focused on maintaining label consistency, handling class imbalance, and identifying noisy or ambiguous samples. Fine-tuned a DistilBERT model on the annotated dataset and conducted error analysis to refine labeling accuracy. Evaluated model performance using classification metrics and improved prediction reliability by iteratively enhancing data quality. Achieved ~98% model accuracy, demonstrating the impact of precise and well-structured data annotation.

2025 - 2025

Education

D

Delhi Public School, Jodhpur

High School Diploma, Science

High School Diploma

2021 - 2022

D

Delhi Public School, Jodhpur

Secondary School Certificate, General Studies

Secondary School Certificate

2019 - 2020

Work History

F

Finlatics

Business Analytics Intern

Mumbai

2024 - Present

D

Defense Research and Development Organization

Summer Intern

Hyderabad

2024 - 2024