Multi-Modal LLM Training & Code Optimization (RLHF)
Acted as a Subject Matter Expert (SME) to train and refine Large Language Models (LLMs) specifically for complex programming tasks. My role involved writing high-quality Python and JavaScript code snippets to serve as 'Golden Answers' for Supervised Fine-Tuning (SFT). I performed rigorous RLHF (Reinforcement Learning from Human Feedback) by ranking model-generated responses based on logic, security, and Big O efficiency. I specialized in identifying 'hallucinations' in code and providing corrected logic for complex algorithms, ensuring the model adhered to modern software engineering best practices.