Reinforcement learning human feedback
I had to create prompts asking something code related and evaluate model responses based on instruction following and precision and accuracy, mark the response and decide which one was better. Finally provide the best possible response if the models didn't provide it