AI Red Teaming Specialist – Mercor
As an AI Red Teaming Specialist at Mercor, I conducted adversarial testing of large language models to identify safety, alignment, and policy compliance vulnerabilities. I designed structured red-team attack scenarios and evaluated model responses for robustness, hallucination rates, and bias exposure. I documented reproducible failure cases, providing clear annotations to support model improvement and safety advancements. • Designed and executed red-teaming and adversarial prompt testing. • Analyzed model outputs for robustness, refusal, bias, and hallucination. • Annotated vulnerabilities and provided feedback to enhance AI safety. • Utilized internal/proprietary red-teaming and evaluation frameworks.