Prompt Quality Assessment for Factuality Benchmark – AI Evaluation
This project involved evaluating the factual quality and suitability of user-generated prompts intended for benchmarking AI model accuracy. I reviewed and categorized prompts based on a detailed taxonomy of errors, including hallucinations, hypotheticals, non-factual requests, non-answerable queries, safety violations, ambiguity, and subjectivity. The task required distinguishing between prompts to reject outright and those that could be revised while ensuring alignment with project goals—focusing only on prompts that seek factual information. The role demanded critical thinking, linguistic analysis, and careful judgment to help build a high-quality dataset for factual AI evaluation.