Sparrowsignet
This project required participants to write a series of prompts and thorough rubrics according to instruction hierarchy and system steer-ability. The prompts had to align with a specific difficulty score (1 for basic everyday asks, 3 for specialized/niche information) and conflict with instruction requirements (no conflict, full conflict, or conflict with exception). The final prompt had to cause model failure in a specific area, such as instruction following, action, emotion, etc. The participants were also required to write very thorough rubrics that listed out every ask in the final prompt, both implicit and explicit, then score the final response based on that rubric, ensuring that the response scored under 70%.