For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
R
Rhiddha Acharjee

Rhiddha Acharjee

LLM Failure Analyst

India flagBarrackpore, India
$8.00/hrIntermediateOtherAnno MageImerit

Key Skills

Software

Other
Anno-MageAnno-Mage
iMeritiMerit
Internal/Proprietary Tooling

Top Subject Matter

Computational Physics
LLM Evaluation
Condensed Matter Physics

Top Data Types

TextText
ImageImage
DocumentDocument

Top Task Types

ClassificationClassification
Bounding BoxBounding Box
Object DetectionObject Detection
Text GenerationText Generation
Evaluation/RatingEvaluation/Rating

Freelancer Overview

LLM Failure Analyst. Brings 2+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include DataForce by TransPerfect, Innodata Inc., and iMerit. Education includes Doctor of Philosophy, University of Calcutta (2025) and Master of Science, West Bengal State University (2024). AI-training focus includes data types such as Text and labeling workflows including Evaluation, Rating, and Classification.

IntermediateEnglishBengaliHindi

Labeling Experience

LLM Failure Analyst

Text
I evaluated outputs of Qwen-class language models using adversarially designed doctoral-level physics problems. I designed benchmarks to assess language model reasoning in complex scientific contexts. I developed and improved evaluation pipelines focused on model robustness and accuracy. • Crafted and administered evaluation sets for LLMs. • Identified failure points in LLM physics reasoning. • Implemented standard evaluation rubrics for consistency. • Benchmarked LLM outputs to identify areas for improvement.

I evaluated outputs of Qwen-class language models using adversarially designed doctoral-level physics problems. I designed benchmarks to assess language model reasoning in complex scientific contexts. I developed and improved evaluation pipelines focused on model robustness and accuracy. • Crafted and administered evaluation sets for LLMs. • Identified failure points in LLM physics reasoning. • Implemented standard evaluation rubrics for consistency. • Benchmarked LLM outputs to identify areas for improvement.

2026 - Present

Rubrics-based AI Benchmarking and A/B Testing (Outlier AI - Project Ather)

OtherText
I conducted rubrics-based AI benchmarking and A/B testing to assess model responses. My tasks included evaluating model outputs against criteria for scientific accuracy and rationale. I systematically documented findings for feedback to model development teams. • Ran controlled A/B evaluations of model behavior. • Assessed consistency in scientific reasoning outputs. • Maintained detailed benchmarking reports. • Provided actionable insights for model improvement.

I conducted rubrics-based AI benchmarking and A/B testing to assess model responses. My tasks included evaluating model outputs against criteria for scientific accuracy and rationale. I systematically documented findings for feedback to model development teams. • Ran controlled A/B evaluations of model behavior. • Assessed consistency in scientific reasoning outputs. • Maintained detailed benchmarking reports. • Provided actionable insights for model improvement.

2025 - 2026

Subject Matter Expert

Text
I crafted over 250 adversarial condensed matter physics prompts to challenge AI model correctness. I developed structured evaluation rubrics for systematic assessment of model reasoning. I identified and documented systematic weaknesses in model physics understanding. • Created diverse prompt sets targeting model limitations. • Contributed to research in model benchmarking. • Maintained high annotation quality through rubrics. • Provided detailed feedback on condensed matter prompts.

I crafted over 250 adversarial condensed matter physics prompts to challenge AI model correctness. I developed structured evaluation rubrics for systematic assessment of model reasoning. I identified and documented systematic weaknesses in model physics understanding. • Created diverse prompt sets targeting model limitations. • Contributed to research in model benchmarking. • Maintained high annotation quality through rubrics. • Provided detailed feedback on condensed matter prompts.

2025 - 2026
iMerit

Scientific Data Annotator and Data Labeller

ImeritTextClassification
I annotated and labeled scientific content and AI-generated images for AI training purposes. My work involved applying structured criteria to ensure accuracy in complex scientific contexts. I collaborated with a team to maintain high annotation standards and consistency. • Labeled diverse scientific data types and images. • Ensured high precision in annotation tasks. • Used team-based workflows for efficient turnover. • Performed quality checks on labeled datasets.

I annotated and labeled scientific content and AI-generated images for AI training purposes. My work involved applying structured criteria to ensure accuracy in complex scientific contexts. I collaborated with a team to maintain high annotation standards and consistency. • Labeled diverse scientific data types and images. • Ensured high precision in annotation tasks. • Used team-based workflows for efficient turnover. • Performed quality checks on labeled datasets.

2024 - 2025

Education

W

West Bengal State University

Master of Science, Theoretical Condensed Matter Physics

Master of Science
2022 - 2024
B

Barasat Government College

Bachelor of Science, Physics

Bachelor of Science
2019 - 2022

Work History

I

Innodata

Subject Matter Expert

N/A
2025 - 2026