Arsenic Project Training - Safety risks
This training involved understanding the types of prompts and classifying them according to their level of toxicity (safe, benign, harmful or jailbreak). I also had to recognize the safety risk category (high risk content, sensitive topics, modeling and training risks, illegal behavior, etc). Then, I also had to asses the models response. For example, the model could engage the prompt, shortly decline, decline with a reason or give a disclaimer and then engage. I was also given guidelines as to what the ideal response would be and the characteristics of a harm-free response.