Safety Dance
In this project we were required to either write benign or harmful prompts in order to make the model fail its safety guidelines. We would engineer a prompt and the model would output two responses. We would then rate those responses on various dimensions.