AI Training & Evaluation Specialist
As an AI Training & Evaluation Specialist at Appen, I focused on evaluating and improving large language models through structured review and adversarial testing. My work involved prompt engineering, data annotation, and designing gold-standard benchmarks to enhance model performance. I developed thousands of high-quality training samples and provided data-driven feedback for model fine-tuning. • Evaluated LLM outputs for reasoning accuracy, instruction adherence, and hallucinations. • Designed adversarial prompts and contributed to red-teaming efforts. • Created benchmark datasets and gold-standard responses. • Identified and documented recurring quality issues for model improvement.