Adversarial Tester
Argon On the Argon project, my primary task was to test and evaluate a large language model's (LLM) ability to identify and respond to harmful content in images. My responsibilities included: 1. Sourcing Harmful Images: I would find images from specific websites that contained harmful material, such as sexually explicit content, violence, political extremism, or other sensitive topics. 2. Prompting the Model: I would then input these images into the model along with a text prompt. The goal was to see if the model could correctly identify the harmful nature of the image and refuse to engage with it, or provide a safe and appropriate response. 3. Scoring and Editing: I meticulously scored the model's responses based on its ability to recognize the harmful content and adhere to safety guidelines. If the model's response was inadequate, I would edit it to provide an example of the ideal, safe, and helpful response it should have given.