LLM Safety & Red Team Evaluation - Amazon internal AI models
I worked on evaluating LLM responses for safety, factual correctness, user-intent alignment, and guideline compliance. My work followed a red-teaming style approach where I tested model behavior with diverse prompts, identified unsafe or biased outputs, and documented failure cases. I performed detailed audits, flagged harmful or policy-violating content, clarified ambiguous cases, and provided structured feedback to improve model safety and robustness. This included checking edge cases, applying strict labeling rules, and ensuring consistency across multilingual datasets. I also reviewed other annotators’ work to maintain high accuracy standards.