For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
R
Rithwik Nyalam

Rithwik Nyalam

AI Data Labeler & LLM Evaluation Specialist

India flagKarimnagar, India
$12.00/hrIntermediateLabel StudioScale AIOther

Key Skills

Software

Label StudioLabel Studio
Scale AIScale AI
Other

Top Subject Matter

Large Language Models
Conversational AI
Code Generation

Top Data Types

TextText
DocumentDocument
AudioAudio

Top Task Types

ClassificationClassification
Data CollectionData Collection
Evaluation/RatingEvaluation/Rating

Freelancer Overview

AI Integration Engineer with 2+ years of hands-on experience building, evaluating, and deploying production LLM systems. Built 8 production AI systems including RAG pipelines, MCP servers, and multi-agent architectures. Deep expertise in prompt engineering, LLM output evaluation, RLHF feedback collection, red teaming, and AI quality assurance across code generation, reasoning, and conversational AI domains. Deployed live AI products using Anthropic Claude API and OpenAI API serving real users. Skilled in evaluating AI responses for accuracy, safety, coherence, and instruction following. A unique builder perspective enables me to identify subtle model failure patterns that standard annotators miss.

IntermediateEnglishHindiTelugu

Labeling Experience

RAG System Data Quality and Retrieval Evaluation

OtherDocumentClassification
I conducted data annotation and quality evaluation for RAG (Retrieval Augmented Generation) systems. My responsibilities included document chunk labeling, semantic similarity annotation, and relevance scoring to optimize vector database retrieval. I curated and benchmarked training datasets for factual accuracy, context alignment, and reduced hallucination. • Labeled document chunks for semantic similarity, topic relevance, and retrieval quality • Evaluated answers for groundedness and improved RAG system performance benchmarks • Utilized ChromaDB, Pinecone, and Weaviate for data handling and evaluation tasks • Achieved measurable improvements in vector database retrieval and relevance

I conducted data annotation and quality evaluation for RAG (Retrieval Augmented Generation) systems. My responsibilities included document chunk labeling, semantic similarity annotation, and relevance scoring to optimize vector database retrieval. I curated and benchmarked training datasets for factual accuracy, context alignment, and reduced hallucination. • Labeled document chunks for semantic similarity, topic relevance, and retrieval quality • Evaluated answers for groundedness and improved RAG system performance benchmarks • Utilized ChromaDB, Pinecone, and Weaviate for data handling and evaluation tasks • Achieved measurable improvements in vector database retrieval and relevance

2025 - 2025

Code and Technical Output Evaluation

Other
As part of technical output evaluation, I assessed AI-generated code for correctness, efficiency, security, and adherence to best practices. I annotated code with bug severity ratings and provided scores and suggestions for training models. I also evaluated documentation and explanation quality to ensure clarity and completeness. • Evaluated Python, JavaScript, and Next.js code from AI systems for technical quality • Annotated output with bug ratings, correctness scores, and improvement notes • Reviewed technical documentation for clarity and accuracy • Contributed to improved model coding performance through labeled training data

As part of technical output evaluation, I assessed AI-generated code for correctness, efficiency, security, and adherence to best practices. I annotated code with bug severity ratings and provided scores and suggestions for training models. I also evaluated documentation and explanation quality to ensure clarity and completeness. • Evaluated Python, JavaScript, and Next.js code from AI systems for technical quality • Annotated output with bug ratings, correctness scores, and improvement notes • Reviewed technical documentation for clarity and accuracy • Contributed to improved model coding performance through labeled training data

2024 - 2025

AI Output Quality Assurance and Safety Testing

OtherText
I performed quality assurance and safety testing for AI-generated content with a focus on fintech and career guidance domains. I annotated conversations for tone, factual correctness, and alignment with user intent in production deployments. I tested multi-agent systems for consistency, context retention, and documented failure patterns to enhance model safety. • Annotated AI outputs for safety, tone, and factual accuracy in domain-specific contexts • Conducted moderation and appropriateness assessment for live models • Performed multi-turn conversation evaluation and model consistency checks • Identified over-refusal, hallucination, and instruction drift in AI outputs

I performed quality assurance and safety testing for AI-generated content with a focus on fintech and career guidance domains. I annotated conversations for tone, factual correctness, and alignment with user intent in production deployments. I tested multi-agent systems for consistency, context retention, and documented failure patterns to enhance model safety. • Annotated AI outputs for safety, tone, and factual accuracy in domain-specific contexts • Conducted moderation and appropriateness assessment for live models • Performed multi-turn conversation evaluation and model consistency checks • Identified over-refusal, hallucination, and instruction drift in AI outputs

2024 - 2025

LLM Prompt Engineering and Output Evaluation

OtherText
As an AI Data Labeler and LLM Evaluator, I performed rigorous evaluation and annotation of large language model outputs. I utilized systematic techniques like prompt testing, response ranking, and RLHF feedback to generate high-quality training datasets. I applied red teaming strategies to identify safety gaps and improve model robustness. • Evaluated AI-generated text for accuracy, helpfulness, coherence, safety, and instruction-following across various domains • Conducted preference-based comparative ranking and response rating for RLHF data collection • Designed evaluation rubrics and performed multi-turn conversational AI assessments • Identified hallucinations, reasoning errors, and annotated model outputs for instructional quality

As an AI Data Labeler and LLM Evaluator, I performed rigorous evaluation and annotation of large language model outputs. I utilized systematic techniques like prompt testing, response ranking, and RLHF feedback to generate high-quality training datasets. I applied red teaming strategies to identify safety gaps and improve model robustness. • Evaluated AI-generated text for accuracy, helpfulness, coherence, safety, and instruction-following across various domains • Conducted preference-based comparative ranking and response rating for RLHF data collection • Designed evaluation rubrics and performed multi-turn conversational AI assessments • Identified hallucinations, reasoning errors, and annotated model outputs for instructional quality

2024 - 2025

Education

K

KIMS College

Bachelor of Business Administration, Business Administration

Bachelor of Business Administration
2023

Work History

C

Colvio

AI Integration Engineer

Karimnagar
2024 - Present
O

Outskill

AI Systems Developer

Karimnagar
2025 - 2025