AI Training & LLM Evaluation Specialist
Evaluated large language model outputs for accuracy, coherence, safety, and instruction adherence on a weekly basis. Provided structured feedback to support reinforcement learning cycles and model fine-tuning. Identified edge cases and failure modes across a diverse set of prompt scenarios. • Assessed 500+ generative AI text outputs each week • Maintained over 98% accuracy under productivity benchmarks • Relied on comprehensive evaluation frameworks and domain guidelines • Involved in reinforcement learning from human feedback (RLHF) workflows