For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Ayush Mothiya

Ayush Mothiya

AI/ML Engineer

INDIA flag
Remote, India
$20.00/hrIntermediateOtherAws SagemakerGoogle Cloud Vertex AI

Key Skills

Software

Other
AWS SageMakerAWS SageMaker
Google Cloud Vertex AIGoogle Cloud Vertex AI
MindriftMindrift
MercorMercor
Internal/Proprietary Tooling

Top Subject Matter

AI Tutoring and Software Development
Finance
Science

Top Data Types

TextText
ImageImage
DocumentDocument

Top Task Types

RLHF
Bounding Box
Segmentation
Classification
Entity Ner Classification
Object Detection
Text Generation
Question Answering
Text Summarization
Fine Tuning
Red Teaming
Transcription
Evaluation Rating
Computer Programming Coding

Freelancer Overview

In my experience at Abundant (YC) and Sirius AI, I have moved beyond simple data annotation to the high-level curation and validation of complex datasets for agentic systems. At Abundant, I specifically focused on adversarial data engineering, where I curated novel dataset environments to measure "sycophancy for bureaucracy" and decision-making stability in SOTA models like GPT-4 and Claude 3.5. This involved not just labeling, but designing intricate "compliance traps" and indirect prompt injections to evaluate how models handle conflicting safety constraints. My work ensured that the training and evaluation data accurately represented the edge cases that lead to model failure in real-world deployments. What sets me apart is my ability to bridge the gap between data quality and model performance. During my research at UIT The Arctic University of Norway, I managed the end-to-end data pipeline for a GAN-based Super-Resolution model using the DIV2K benchmark, which required precise preprocessing and feature preparation of high-resolution image data. My background in Engineering Physics from IIT Guwahati, combined with my status as a Kaggle Bronze Medalist (RSNA 2024), has given me a rigorous, statistically-grounded approach to data validation. I am highly proficient in using Python, SQL, and PyTorch to automate data processing and ensure the reproducibility and correctness of AI training sets.

IntermediateHindiFrenchEnglish

Labeling Experience

Independent Consultant

Computer Code ProgrammingComputer Programming Coding
At Abundant (YC), the project scope centered on engineering a high-fidelity adversarial evaluation framework for State-of-the-Art (SOTA) LLM agents, specifically targeting GPT-4 and Claude 3.5. The objective was to move beyond standard benchmarks by creating complex, multi-step environments that tested the limits of agentic control and safety alignment in professional settings. Specific Data Labeling & Curation Tasks: Rather than basic annotation, my tasks involved the creation and labeling of "adversarial scenarios." This included: Compliance Trap Engineering: Designing and labeling 50+ unique prompt-based environments where a model was forced to choose between a direct user instruction and a conflicting, documented safety or bureaucratic constraint. Vulnerability Mapping: Identifying and categorizing instances of "sycophancy for bureaucracy," where models prioritized following rigid documentation over logical safety, and labeling these failure modes to train detection systems. Indirect Prompt Injection: Constructing datasets where malicious instructions were "hidden" within legitimate-looking data (e.g., a PDF or a simulated database) to evaluate the model's ability to filter unsafe context during RAG-style operations. Project Size: The project involved the curation and validation of novel dataset environments specifically for agentic control research. While the primary focus was on "quality over quantity" to ensure high-difficulty edge cases, the work supported the evaluation of multiple LLM generations across various safety-critical configurations. Quality Measures Adhered To: To ensure the robustness of the benchmark data, I adhered to several rigorous quality protocols: Zero-Inference Consistency: Every adversarial label was verified against a ground-truth logic tree to ensure that the "trap" was objectively solvable for an aligned model. Adversarial Stability: I conducted iterative "red-teaming" on my own datasets to ensure they maintained their difficulty across different model architectures and weren't bypassed by simple prompt hacks. Documentation-Alignment Check: Each scenario was cross-referenced with official model safety guidelines to ensure the "conflicting constraints" accurately reflected real-world deployment risks.

At Abundant (YC), the project scope centered on engineering a high-fidelity adversarial evaluation framework for State-of-the-Art (SOTA) LLM agents, specifically targeting GPT-4 and Claude 3.5. The objective was to move beyond standard benchmarks by creating complex, multi-step environments that tested the limits of agentic control and safety alignment in professional settings. Specific Data Labeling & Curation Tasks: Rather than basic annotation, my tasks involved the creation and labeling of "adversarial scenarios." This included: Compliance Trap Engineering: Designing and labeling 50+ unique prompt-based environments where a model was forced to choose between a direct user instruction and a conflicting, documented safety or bureaucratic constraint. Vulnerability Mapping: Identifying and categorizing instances of "sycophancy for bureaucracy," where models prioritized following rigid documentation over logical safety, and labeling these failure modes to train detection systems. Indirect Prompt Injection: Constructing datasets where malicious instructions were "hidden" within legitimate-looking data (e.g., a PDF or a simulated database) to evaluate the model's ability to filter unsafe context during RAG-style operations. Project Size: The project involved the curation and validation of novel dataset environments specifically for agentic control research. While the primary focus was on "quality over quantity" to ensure high-difficulty edge cases, the work supported the evaluation of multiple LLM generations across various safety-critical configurations. Quality Measures Adhered To: To ensure the robustness of the benchmark data, I adhered to several rigorous quality protocols: Zero-Inference Consistency: Every adversarial label was verified against a ground-truth logic tree to ensure that the "trap" was objectively solvable for an aligned model. Adversarial Stability: I conducted iterative "red-teaming" on my own datasets to ensure they maintained their difficulty across different model architectures and weren't bypassed by simple prompt hacks. Documentation-Alignment Check: Each scenario was cross-referenced with official model safety guidelines to ensure the "conflicting constraints" accurately reflected real-world deployment risks.

2025 - 2026

Data Science Intern

OtherTextRLHF
As a Data Science Intern at PhysicsWallah, I improved ChatGPT’s quantitative solving by enhancing prompt engineering and implementing bias mitigation strategies. I focused on increasing AI model performance using advanced prompt designs, including Tree-of-Thought (ToT) and Chain-of-Thought (CoT). The work also included fine-tuning models and deploying query classifiers using quantized Llama3.1 and Gemma2b to optimize operational efficiency. • Improved ChatGPT’s mathematical response quality by 15% through prompt calibration. • Applied ToT/CoT methods to instruct LLM behaviors for better quantitative accuracy. • Fine-tuned AI models and deployed enhanced query classifiers for robust decision-making. • Contributed to bias reduction and RLHF workflows in an educational technology context.

As a Data Science Intern at PhysicsWallah, I improved ChatGPT’s quantitative solving by enhancing prompt engineering and implementing bias mitigation strategies. I focused on increasing AI model performance using advanced prompt designs, including Tree-of-Thought (ToT) and Chain-of-Thought (CoT). The work also included fine-tuning models and deploying query classifiers using quantized Llama3.1 and Gemma2b to optimize operational efficiency. • Improved ChatGPT’s mathematical response quality by 15% through prompt calibration. • Applied ToT/CoT methods to instruct LLM behaviors for better quantitative accuracy. • Fine-tuned AI models and deployed enhanced query classifiers for robust decision-making. • Contributed to bias reduction and RLHF workflows in an educational technology context.

2024 - 2024

Education

I

Indian Institute of Technology Guwahati

Bachelor of Technology, Engineering Physics

Bachelor of Technology
2021 - 2025

Work History

S

SiriusAI

Analyst

Gurgaon
2025 - 2026
A

Abundant

Consultant (Freelance)

Remote
2025 - 2026