AI Evaluator / Red Teaming Specialist (Contract)
This role involved evaluating LLM outputs from adversarial stress tests and multi-stage jailbreak attempts. I identified critical vulnerabilities in model alignment and policy refusal logic, and conducted comparative evaluations of model responses, generating preference data for RLHF pipelines. I documented and classified complex model failure modes, providing actionable insights for AI model safety enforcement. • Analyzed and rated side-by-side model outputs for safety and alignment. • Generated preference data to directly inform RLHF (Reinforcement Learning from Human Feedback) training processes. • Classified failure modes such as semantic pivots and illicit technical validation cases. • Produced forensic insights bridging user intent with safety protocols.