LLM Evaluation | Appen
I evaluated large language model (LLM) outputs for accuracy, relevance, and adherence to guidelines. My work included conducting comparative assessments and providing structured feedback to enhance LLM performance. This contributed significantly to fine-tuning and improving AI model reliability. • Conducted LLM response evaluations for various prompt types • Provided rationale and ranking for model outputs • Collaborated across global evaluation teams • Helped improve response accuracy by 20%