Remote AI Trainer / LLM Evaluator
As a Remote AI Trainer and LLM Evaluator, I evaluated over 15,000 large language model (LLM) responses, focusing on reasoning, coding, and factual verification tasks. I delivered RLHF feedback contributing to notable reductions in model hallucination rates. I regularly designed adversarial and edge-case prompts for stress-testing model capabilities. • Conducted output ranking and benchmarking to enhance evaluation consistency • Identified recurring reasoning failures to improve model alignment • Increased quality of training signals through systematic evaluation • Emphasized model behavior analysis and reliable output generation