Arjun Mishra - AI Red Team Engineer | AI RLHF Specialist | Hindi Bilingual

Key Skills

Software

Appen

Data Annotation Tech

Labelbox

Mercor

Mindrift

Internal/Proprietary Tooling

Top Subject Matter

No subject matter listed

Top Data Types

Audio

Text

Video

Top Task Types

Entity (NER) Classification

RLHF

Evaluation/Rating

Red Teaming

Prompt + Response Writing (SFT)

Question Answering

Diagnosis

Text Generation

Translation/Localization

Object Detection

Freelancer Overview

I am an AI specialist with hands-on experience in data annotation, red teaming, and reinforcement learning from human feedback (RLHF) for leading AI platforms. My work focuses on generating, evaluating, and refining high-quality training data for large language models, with a strong emphasis on adversarial testing, prompt injection, and safety guardrail analysis. I am skilled in multi-layered rubric creation, logic-based evaluation, and policy alignment, and have developed methodologies for low-resource and code-switched (Hindi/Hinglish) adversarial attacks to expose model vulnerabilities. Proficient in Python, JSON, and industry-standard frameworks like Garak and PyRIT, I ensure data quality and model reliability across diverse NLP tasks. My background also includes user intent analysis, linguistic quality assurance, and structured documentation, making me adept at delivering precise, nuanced, and secure data labeling for complex AI systems.

IntermediateHindiEnglish

Labeling Experience

Hindi Culture Expert & Bilingual Safety Expert

Internal Proprietary ToolingTextRLHFFine Tuning

This is a SBS project where I evaluate two responses based on instruction following, clarity, tone and linguistic issues. The project targets Indian culture and the responses are overall judged by their usefulness in Indian context.

2025

Hindi Writer

MercorTextPrompt Response Writing SFT

Writing Hindi Prompts and the golden responses.

2025

Native Hindi Localization & Creative SFT

Data Annotation TechTextText GenerationTranslation Localization

Scope: Led creative writing and cultural localization projects to improve Generative AI fluency for the Indian market (en_IN and hi_IN). Focused on training models to handle cultural nuance, idioms, and code-switching (Hinglish). Specific Tasks: Authored high-constraint, culturally localized prompts and narratives in Hindi to stress-test model capabilities. Quality Measures: Ensured zero "Anglicized Hindi" hallucinations and maintained strict adherence to cultural sensitivity guidelines.

2025

Medical Domain Safety & Accuracy Evaluation

Data Annotation TechTextQuestion AnsweringDiagnosis

Scope: Specialized RLHF and safety evaluation for health-related queries, focusing on minimizing dangerous medical hallucinations and ensuring strict adherence to "Non-Prescriptive" safety guidelines. Specific Tasks: Evaluated model responses to distinguish between "General Health Information" (Safe) and "Specific Medical Advice" (Unsafe/Restricted). Verified factual claims against established medical consensus to prevent misinformation. Enforced strict "Refusal" protocols for high-risk queries (e.g., self-diagnosis or emergency scenarios). Quality Measures: 100% adherence to Safety Guidelines regarding harm reduction and mandatory medical disclaimers.

2025

Adversarial Red Teaming & RLHF Evaluation

Data Annotation TechTextEntity Ner ClassificationRLHF

Scope: High-volume adversarial testing (Red Teaming) for a major Generative AI model to improve safety guardrails against hallucinations, bias, and jailbreak attempts. Specific Tasks: Designed complex, multi-turn adversarial prompts to bypass safety filters. Authored detailed "Ground Truth" rationales (200+ words) justifying rankings based on Truthfulness, Helpfulness, and Harmlessness (HHH). Annotated reasoning traces to fix logical fallacies in math/coding outputs. Quality Measures: Strictly adhered to complex safety guidelines regarding PII and hate speech.

2025

Education

T

The Chintels School

Senior Secondary School Certificate (High School), General Studies

Senior Secondary School Certificate (High School)

2019 - 2019

Work History

I

Insignia Consultancy Solutions

User Experience & Documentation Specialist

Remote

2024 - 2024

S

Self-Directed Sabbatical

Independent Researcher: Logic & Cognitive Systems

Remote

2023 - 2024