For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
D
Daffa Ashika

Daffa Ashika

LLM Response Quality Auditor (Self-Initiated Project)

Indonesia flagaceh, Indonesia
$17.00/hrIntermediate

Key Skills

Software

No software listed

Top Subject Matter

LLM Evaluation
E-commerce/NLP Training Data
Bias Detection/LLM Evaluation

Top Data Types

TextText

Top Task Types

ClassificationClassification

Freelancer Overview

LLM Response Quality Auditor (Self-Initiated Project). Brings 4+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Excel. Education includes Self-Directed (Online Platforms), Independent Study — Artificial Intelligence & Data (2023) and Self-Certified, Project-Based, Advanced Excel & Data Analysis (2023). AI-training focus includes data types such as Text and labeling workflows including Evaluation, Rating, and Classification.

IntermediateEnglish

Labeling Experience

Prompt-to-Output Bias Detection Study (Self-Initiated Project)

Text
Conducted prompt-to-output bias detection experiments across three large language model (LLM) tools. Systematically annotated and categorized outputs, identifying biases, hallucinations, and policy-sensitive cases in generated responses. Demonstrated robust quality assurance discipline aligned with real-world RLHF evaluation workflows. • Tested 60+ prompt variations and categorized 180 LLM outputs • Flagged 23 factual errors and 11 edge cases for bias • Structured annotations documented in Excel • Enhanced detection of demographic and policy-sensitive issues.

Conducted prompt-to-output bias detection experiments across three large language model (LLM) tools. Systematically annotated and categorized outputs, identifying biases, hallucinations, and policy-sensitive cases in generated responses. Demonstrated robust quality assurance discipline aligned with real-world RLHF evaluation workflows. • Tested 60+ prompt variations and categorized 180 LLM outputs • Flagged 23 factual errors and 11 edge cases for bias • Structured annotations documented in Excel • Enhanced detection of demographic and policy-sensitive issues.

2025 - Present

E-commerce Dataset Annotation Simulation (Self-Initiated Project)

TextClassification
Performed manual labeling on over 500 e-commerce product descriptions using a multi-label taxonomy for category, sentiment, and intent. Implemented rigorous inter-rater consistency checks and iterative self-review cycles to maintain high annotation accuracy. Automated batch tracking and reporting processes in Excel to enhance project efficiency. • Labeled 500+ product descriptions for NLP dataset simulation • Applied multi-label taxonomy (category, sentiment, intent) • Achieved and maintained >96% annotation accuracy • Reduced audit and review cycle by approximately 40% with automation.

Performed manual labeling on over 500 e-commerce product descriptions using a multi-label taxonomy for category, sentiment, and intent. Implemented rigorous inter-rater consistency checks and iterative self-review cycles to maintain high annotation accuracy. Automated batch tracking and reporting processes in Excel to enhance project efficiency. • Labeled 500+ product descriptions for NLP dataset simulation • Applied multi-label taxonomy (category, sentiment, intent) • Achieved and maintained >96% annotation accuracy • Reduced audit and review cycle by approximately 40% with automation.

2024 - 2024

LLM Response Quality Auditor (Self-Initiated Project)

Text
Designed and executed a framework to evaluate responses from large language models (LLMs), focusing on factual accuracy, coherence, and instruction alignment. Used structured Excel tracking with personalized rubrics to review and rate over 200 AI-generated outputs across multiple prompt categories. Documented comparative findings and identified consistent patterns in output quality and common failure modes. • Evaluated outputs from ChatGPT and Claude on 200+ prompts • Created and used custom evaluation rubrics in Excel • Assessed labeling accuracy and model failure across 8 task types • Generated a structured comparative quality report.

Designed and executed a framework to evaluate responses from large language models (LLMs), focusing on factual accuracy, coherence, and instruction alignment. Used structured Excel tracking with personalized rubrics to review and rate over 200 AI-generated outputs across multiple prompt categories. Documented comparative findings and identified consistent patterns in output quality and common failure modes. • Evaluated outputs from ChatGPT and Claude on 200+ prompts • Created and used custom evaluation rubrics in Excel • Assessed labeling accuracy and model failure across 8 task types • Generated a structured comparative quality report.

2024 - 2024

Education

A

Advanced Excel & Data Analysis

Self-Certified, Project-Based

Self-Certified, Project-Based
2023 - 2023
I

Independent Study — Artificial Intelligence & Data

Self-Directed (Online Platforms)

Self-Directed (Online Platforms)
2023

Work History

S

Self-Directed (Online Platforms)

Independent Study — Artificial Intelligence & Data

Location not specified
2023 - Present