For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
A
Arjun Mishra

Arjun Mishra

AI Red Team Engineer | AI RLHF Specialist | Hindi Bilingual

India flagKanpur, India
$10.00/hrIntermediateAppenData Annotation TechLabelbox

Key Skills

Software

AppenAppen
Data Annotation TechData Annotation Tech
LabelboxLabelbox
MercorMercor
MindriftMindrift
Internal/Proprietary Tooling

Top Subject Matter

No subject matter listed

Top Data Types

AudioAudio
TextText
VideoVideo

Top Task Types

Entity (NER) ClassificationEntity (NER) Classification
RLHFRLHF
Evaluation/RatingEvaluation/Rating
Red TeamingRed Teaming
Prompt + Response Writing (SFT)Prompt + Response Writing (SFT)
Question AnsweringQuestion Answering
DiagnosisDiagnosis
Text GenerationText Generation
Translation/LocalizationTranslation/Localization
Object DetectionObject Detection

Freelancer Overview

I am an AI specialist with hands-on experience in data annotation, red teaming, and reinforcement learning from human feedback (RLHF) for leading AI platforms. My work focuses on generating, evaluating, and refining high-quality training data for large language models, with a strong emphasis on adversarial testing, prompt injection, and safety guardrail analysis. I am skilled in multi-layered rubric creation, logic-based evaluation, and policy alignment, and have developed methodologies for low-resource and code-switched (Hindi/Hinglish) adversarial attacks to expose model vulnerabilities. Proficient in Python, JSON, and industry-standard frameworks like Garak and PyRIT, I ensure data quality and model reliability across diverse NLP tasks. My background also includes user intent analysis, linguistic quality assurance, and structured documentation, making me adept at delivering precise, nuanced, and secure data labeling for complex AI systems.

IntermediateHindiEnglish

Labeling Experience

Hindi Culture Expert & Bilingual Safety Expert

Internal Proprietary ToolingTextRLHFFine Tuning
This is a SBS project where I evaluate two responses based on instruction following, clarity, tone and linguistic issues. The project targets Indian culture and the responses are overall judged by their usefulness in Indian context.

This is a SBS project where I evaluate two responses based on instruction following, clarity, tone and linguistic issues. The project targets Indian culture and the responses are overall judged by their usefulness in Indian context.

2025
Mercor

Hindi Writer

MercorTextPrompt Response Writing SFT
Writing Hindi Prompts and the golden responses.

Writing Hindi Prompts and the golden responses.

2025
Data Annotation Tech

Native Hindi Localization & Creative SFT

Data Annotation TechTextText GenerationTranslation Localization
Scope: Led creative writing and cultural localization projects to improve Generative AI fluency for the Indian market (en_IN and hi_IN). Focused on training models to handle cultural nuance, idioms, and code-switching (Hinglish). Specific Tasks: Authored high-constraint, culturally localized prompts and narratives in Hindi to stress-test model capabilities. Quality Measures: Ensured zero "Anglicized Hindi" hallucinations and maintained strict adherence to cultural sensitivity guidelines.

Scope: Led creative writing and cultural localization projects to improve Generative AI fluency for the Indian market (en_IN and hi_IN). Focused on training models to handle cultural nuance, idioms, and code-switching (Hinglish). Specific Tasks: Authored high-constraint, culturally localized prompts and narratives in Hindi to stress-test model capabilities. Quality Measures: Ensured zero "Anglicized Hindi" hallucinations and maintained strict adherence to cultural sensitivity guidelines.

2025
Data Annotation Tech

Medical Domain Safety & Accuracy Evaluation

Data Annotation TechTextQuestion AnsweringDiagnosis
Scope: Specialized RLHF and safety evaluation for health-related queries, focusing on minimizing dangerous medical hallucinations and ensuring strict adherence to "Non-Prescriptive" safety guidelines. Specific Tasks: Evaluated model responses to distinguish between "General Health Information" (Safe) and "Specific Medical Advice" (Unsafe/Restricted). Verified factual claims against established medical consensus to prevent misinformation. Enforced strict "Refusal" protocols for high-risk queries (e.g., self-diagnosis or emergency scenarios). Quality Measures: 100% adherence to Safety Guidelines regarding harm reduction and mandatory medical disclaimers.

Scope: Specialized RLHF and safety evaluation for health-related queries, focusing on minimizing dangerous medical hallucinations and ensuring strict adherence to "Non-Prescriptive" safety guidelines. Specific Tasks: Evaluated model responses to distinguish between "General Health Information" (Safe) and "Specific Medical Advice" (Unsafe/Restricted). Verified factual claims against established medical consensus to prevent misinformation. Enforced strict "Refusal" protocols for high-risk queries (e.g., self-diagnosis or emergency scenarios). Quality Measures: 100% adherence to Safety Guidelines regarding harm reduction and mandatory medical disclaimers.

2025
Data Annotation Tech

Adversarial Red Teaming & RLHF Evaluation

Data Annotation TechTextEntity Ner ClassificationRLHF
Scope: High-volume adversarial testing (Red Teaming) for a major Generative AI model to improve safety guardrails against hallucinations, bias, and jailbreak attempts. Specific Tasks: Designed complex, multi-turn adversarial prompts to bypass safety filters. Authored detailed "Ground Truth" rationales (200+ words) justifying rankings based on Truthfulness, Helpfulness, and Harmlessness (HHH). Annotated reasoning traces to fix logical fallacies in math/coding outputs. Quality Measures: Strictly adhered to complex safety guidelines regarding PII and hate speech.

Scope: High-volume adversarial testing (Red Teaming) for a major Generative AI model to improve safety guardrails against hallucinations, bias, and jailbreak attempts. Specific Tasks: Designed complex, multi-turn adversarial prompts to bypass safety filters. Authored detailed "Ground Truth" rationales (200+ words) justifying rankings based on Truthfulness, Helpfulness, and Harmlessness (HHH). Annotated reasoning traces to fix logical fallacies in math/coding outputs. Quality Measures: Strictly adhered to complex safety guidelines regarding PII and hate speech.

2025

Education

T

The Chintels School

Senior Secondary School Certificate (High School), General Studies

Senior Secondary School Certificate (High School)
2019 - 2019

Work History

I

Insignia Consultancy Solutions

User Experience & Documentation Specialist

Remote
2024 - 2024
S

Self-Directed Sabbatical

Independent Researcher: Logic & Cognitive Systems

Remote
2023 - 2024