AI Technical Content & Code Dataset Labeling
This project involves the technical evaluation and optimization of frontier Large Language Models (LLMs) to improve their reasoning and coding capabilities. My role focuses on the "Gold Standard" training phase, where I evaluate and rank AI-generated code in Python, SQL, and JavaScript for functional correctness and logical efficiency. Key tasks include performing Reinforcement Learning from Human Feedback (RLHF) to identify and mitigate model hallucinations and logic bottlenecks. I also author high-quality, ground-truth code solutions and refine prompt engineering strategies to ensure models can handle complex, multi-step constraints and edge cases. Quality is maintained through strict adherence to rigorous style guidelines and a multi-step auditing process to ensure 100% data integrity for model fine-tuning.