RLHF Expert for STEM & Mathematical Reasoning
Project: Large Language Model (LLM) Fine-Tuning & Evaluation Description: I contributed to a major Reinforcement Learning from Human Feedback (RLHF) project aimed at improving the factual accuracy and safety of a conversational AI. My role involved evaluating model-generated responses based on complex rubrics (truthfulness, harmlessness, and helpfulness). I specialized in "Chain-of-Thought" prompting, where I wrote out step-by-step reasoning for mathematical and logical queries to help the model learn structured problem-solving.