AI Adversarial Prompt Specialist — French Language
Designed and executed French-language adversarial prompts to test and probe the safety guardrails of frontier AI LLMs for Alice.io. Crafted structured scenarios targeting unsafe content generation, guardrail bypass, policy compliance boundaries, and documented findings for the safety research team. Applied narrative and indirect elicitation techniques across content restrictions and policy edge cases to expose model vulnerabilities. • Developed adversarial prompts to elicit unsafe or non-compliant outputs • Operated under strict ethical guidelines, ensuring vulnerabilities were revealed without promoting harm • Provided a multilingual perspective critical for cross-lingual testing and identifying policy gaps • Collaborated with researchers to document failure modes and reproduction steps