RLHF & Factuality Evaluation for Large Language Models
Performed high-level Reinforcement Learning from Human Feedback (RLHF) to align Large Language Models with human intent. Tasks included ranking multiple AI-generated responses based on helpfulness, honesty, and safety. I conducted deep-dive fact-checking, ensuring model outputs adhered to logical constraints and lacked hallucinations. I also authored complex multi-turn prompts to test the model’s reasoning capabilities in specialized domains.