RLHF & Model Safety (Best for Generalist/Writing Roles)
Performed Reinforcement Learning from Human Feedback (RLHF) to align LLMs with human intent. Tasks included ranking model responses based on truthfulness, helpfulness, and safety. I identified subtle hallucinations and ensured compliance with strict ethical guidelines to prevent biased outputs.