Cypher rlhf
Participants are guided through the process of creating conversations with a chatbot. The task begins with drafting a prompt, which serves as a starting point for generating two responses from the AI model. The overarching goal is to craft prompts that lead to one of the responses exhibiting a Model Failure (an error or inadequacy in the AI's output). If neither response demonstrates such a failure, the prompt needs to be adjusted and refined to provoke this situation. Once the responses are generated, each one is evaluated based on seven specific attributes, such as accuracy, humor, conciseness, and more. Participants then choose the better response of the two and provide a justification for their choice. The final step involves rewriting the selected response to make it flawless—addressing issues like truthfulness, tone, brevity, or any other identified weaknesses.