Senior Machine Learning Engineer — RLHF Data Collection & AI Training
Designed and implemented RLHF pipelines for fine-tuning production-grade large language models in an enterprise context. Led end-to-end human preference data collection, reward model training, and PPO-based policy optimization cycles. Focused on AI agent behavior improvement and reduction of factual errors in domain-specific finance applications. • Developed custom reward models and preference datasets for RLHF experiments • Collected and curated human-labeled preference pairs for optimization • Trained reward models to reflect enterprise requirements and user expectations • Contributed to measurable improvements in LLM factuality and reduction in hallucination rates