RLHF and SFT for Large Language Models
Participating in Reinforcement Learning from Human Feedback (RLHF) by ranking model-generated responses based on honesty, harmlessness, and helpfulness. I also draft "Golden Responses" for Supervised Fine-Tuning (SFT).