AI Subject Matter Expert & STEM Evaluator (Consultant), Confidential AI Labs
I conducted advanced red-teaming and evaluation of large language models (LLMs) on complex STEM, physics, and coding tasks. This work involved identifying model limitations and hallucinations, designing challenging prompts, and generating step-by-step, ground-truth solutions for RLHF training. I worked as an AI Subject Matter Expert & STEM Evaluator for Confidential AI Labs, focusing on model performance improvement for technical domains. • Evaluated LLM outputs for accuracy and reliability • Designed prompts for edge-case testing and technical complexity • Generated annotated data for reinforcement learning from human feedback • Improved model robustness in specialized STEM subject areas