Red Teaming - Adversarial Prompting for Multimodal AI (Confidential)
Participated in red teaming tasks focused identifying weaknesses in multimodal AI systems. Responsibilities included crafting adversarial image-based prompts aimed at triggering unsafe, biased, or inaccurate model responses. Tasks required creativity, attention to safety policies, and a deep understanding of model behavior. Findings contributed to the improvement of system robustness and alignment. All project specifics remain confidential under NDA.