Multimodal LLM Technical Training & RLHF
Evaluating and refining Large Language Model (LLM) outputs specifically for software engineering tasks. This involves performing code reviews on model-generated snippets in Python, JavaScript, and SQL to ensure logical correctness, security, and adherence to best practices. I utilize Reinforcement Learning from Human Feedback (RLHF) to rank model responses and provide high-quality "Golden Answers" for supervised fine-tuning.