AI Trainer Fellow – Model Validation Expert-2
In the AI Trainer Fellow – Model Validation Expert-2 role, I evaluated adversarial prompts targeting LLM cybersecurity vulnerabilities and unsafe outputs. I identified critical model failure modes, including jailbreaks and prompt injections. I designed red teaming frameworks and developed evaluation rubrics for model quality and alignment. • Implemented red team evaluations to systematically test model robustness. • Detected and documented 15+ critical failure modes across agent systems. • Created structured rubrics for refusal quality and safety evaluations. • Contributed to continuous improvement in benchmarking and model validation practices.