Outlier AI project
A single prompt and a set of two responses to that same prompt are presented to me. My job is to classify the responses, rate and evaluate them according to specific dimensions (mostly linguistic in nature) and provide feedback. Finally, I rate which of the responses satisfies the user intent in the best way, according to the linguistic dimensions, and provide feedback on why that is the case. This is all done according to extensive manuals that detail the way that ratings and evaluations should be done, based on specific conditions laid out in the manuals.