Principal AI Data Specialist - RLHF Dataset Design & LLM Alignment
As Principal AI Data Specialist at Anthropic, I designed and implemented reinforcement learning from human feedback (RLHF) reward models to enhance LLM performance. I engineered and curated high-dimensional datasets for model fine-tuning, bias mitigation, and red teaming for LLMs. Proprietary internal tools and advanced prompt engineering techniques were integral to the process.• Led mathematical modeling for RLHF to improve model truthfulness and logical reasoning• Statistically designed red teaming datasets to address bias in multi-modal AI systems• Developed Python libraries for processing, curating, and reviewing unstructured data for model training• Collaborated on prompt engineering workflows for semantic and logical analysis of model outputs