AI Safety & Red Teaming - LLM Evaluation
Conducted red teaming and adversarial prompt testing on large language models to identify safety weaknesses and failure cases. Designed challenging prompts to evaluate instruction following, reasoning reliability, and policy compliance. Tested model responses for hallucinations, unsafe outputs, logical inconsistencies, and prompt injection vulnerabilities. Documented model weaknesses and contributed feedback to improve model robustness and alignment. Designed adversarial prompts targeting reasoning, safety, and instruction-following behavior Evaluated model outputs using structured rubric-based criteria Performed iterative prompt refinement to surface edge-case failures