Outlier Sentence By Sentence Level Factuality Evaluation Task
Scope: This task focuses on evaluating model-generated responses to prompts, assessing the factual accuracy of the claims made in each response. The goal is to ensure that the information provided by the model aligns with verified data and facts. Data Labeling Tasks: Tasks include identifying factual claims within the model's responses, assessing the accuracy of these claims, and conducting internet research to provide supporting or contradicting URLs for each claim. Project Size & Duration: The project is ongoing and requires flexible engagement based on the number of prompts and responses evaluated, typically involving several hours of work each week. Quality Measures: Quality is maintained through thorough evaluation of factual accuracy, with requirements for supporting evidence and appropriate citations to ensure reliability and credibility of the claims assessed.