AI Trainer - Model Optimization & RLHF
As an AI Trainer at Outlier AI, I conducted advanced RLHF tasks to refine the accuracy, reasoning, and safety of frontier LLMs. I developed challenging prompts across STEM, Humanities, and Logic domains to evaluate and improve model performance. My work involved rank-ordering LLM outputs using strict rubrics and composing detailed rationales for model decisions. • Executed high-fidelity evaluation of model outputs, emphasizing truthfulness and helpfulness. • Fact-checked model-generated claims, addressing hallucinations and inconsistencies. • Authored technical documentation supporting rationale-based learning. • Maintained consistent quality while adapting to evolving project guidelines.