LLM Red Teamer & RLHF Evaluator
I performed reinforcement learning from human feedback (RLHF) tasks to optimize the reasoning capabilities and user safety of language models. My contributions included red teaming and stress testing models to identify and address logical inconsistencies. This process involved the design and critical evaluation of complex prompts to push model limits. • Designed and executed prompt-based stress tests for LLM safety. • Identified logical and factual weaknesses in LLM outputs. • Collaborated with teams to enhance model robustness against adversarial cases. • Delivered actionable insights for iterative model improvement.