AI Red Teaming & Safety Specialist
Adversarial testing and safety alignment for Large Language Models, specifically tailored for the Italian language market. Designing complex "jailbreak" prompts to bypass safety filters, identifying biases, and testing model boundaries regarding prohibited content, ethical guidelines, and security.Daily high-intensity testing sessions targeting various safety taxonomies. Adherence to strict "Harmlessness" policies and daily calibration with the safety team to ensure zero-tolerance for unsafe model outputs.