Multimodal Red-Teaming & Adversarial Auditing
Specialized in the adversarial stress-testing of Vision-Language Models (VLM) to identify systemic vulnerabilities in multimodal perception and alignment. Focused on bypassing safety guardrails and identifying logical breakpoints where visual and textual inputs conflict. - Multimodal Jailbreak Identification: Developed and executed complex adversarial prompts designed to bypass text-based safety filters through visual injection. This included testing the model’s susceptibility to "hidden" instructions within images (OCR-based exploits) and contradictory visual-textual pairings that could trigger prohibited outputs. - Adversarial Perception Testing: Systematically identified "optical hallucinations" by presenting the model with high-complexity visual scenes, ambiguous spatial relationships, and adversarial perturbations. I focused on forcing failures in object-relationship grounding—specifically where the model would "see" non-existent safety hazards or overlook critical visual context. - Behavioral Bias & Safety Auditing: Leveraged a Behavioral Science framework to red-team for subtle demographic and cultural biases triggered by visual cues. I audited for harmful stereotyping in image-to-text generation, ensuring that model-generated descriptions remained objective and neutralized "reflexive" biases inherent in the training data. - Vulnerability Mapping: Provided detailed forensic reports on "model drift" when processing edge-case visual data, such as low-light technical schematics or distorted medical imagery, where a failure in perception could lead to high-stakes logical errors in downstream RAG (Retrieval-Augmented Generation) tasks.