French- RLHF Generalist | AI Response Evaluation | LLM Alignment | Prompt & Preference Ranking
I conducted LLM red teaming in both French and English to uncover unsafe behaviors, vulnerabilities, and hallucination risks in language models. My work involved designing adversarial prompts and evaluating model responses for safety compliance and multilingual policy consistency. I generated structured risk annotations, severity labels, and remediation-focused rewrites to improve AI alignment. • Identified and flagged jailbreaks, high-risk failure modes, and harmful content leakage. • Contributed to RLHF pipelines through preference ranking and comparative response evaluations. • Produced detailed failure taxonomy reports and risk severity assessments. • Supported policy and operational enhancements by escalating key findings.