Multilingual RLHF & Red Teaming for Generative AI
extensive experience in RLHF (Reinforcement Learning from Human Feedback) and Red Teaming for advanced Large Language Models. Key Responsibilities: Red Teaming & Safety: Designed adversarial "jailbreak" prompts to stress-test model guardrails, specifically targeting edge cases in safety guidelines and harmful content policies. SFT & Creative Writing: Authored high-quality prompts and responses (Supervised Fine-Tuning) in both English and Hebrew, ensuring natural linguistic flow and cultural accuracy. Technical Evaluation: Evaluated model outputs for logical reasoning and accuracy in STEM subjects (Math/Physics), assessing Chain of Thought (CoT) capabilities. Pipeline Improvement: Provided structural feedback to engineering teams regarding annotation tools and guideline ambiguities to streamline the data pipeline.