Pangolin project
AI safety work evaluating dangerous and harmful phrases in LLM outputs. Assessed model responses using a multi-dimensional evaluation framework with detailed feedback on high-risk language patterns to enhance AI safety and responsible model development.