Senior AI Training Specialist: RLHF & Instruction Tuning
Orchestrated the end-to-end data labeling strategy for a suite of enterprise-grade Large Language Models (LLMs) at Dell Technologies. Managed a pipeline of 50,000+ instruction-response pairs focused on IT infrastructure troubleshooting and hardware configuration. Scope: Led the curation and annotation of high-fidelity datasets specifically designed for Reinforcement Learning from Human Feedback (RLHF). This involved designing complex rubrics for labelers to evaluate factual accuracy, harmlessness, and stylistic nuance against proprietary technical documentation. Tasks: Specialized in "ranking" annotation types, where annotators were guided to compare multiple model outputs. Developed Python scripts using PyTorch to perform automated quality checks (consensus analysis) on labeled data, identifying annotator drift and correcting labeling inconsistencies in real-time. Quality Measures: Implemented a "golden set" validation protocol. By injecting known-correct answers into the labeling queue, we maintained an inter-annotator agreement (IAA) score of >92%. This project directly resulted in a 22% reduction in hallucination rates for internal customer-support AI agents.