AI Data Evaluator (Contract)
As an AI Data Evaluator at Micro1 / SearchEvals, I engineered adversarial and time-sensitive prompts to stress-test search-augmented large language models (LLMs). I evaluated model outputs for factual accuracy and documented the failure modes that surfaced, providing detailed analyses to improve AI systems. I designed grading rubrics and evaluation frameworks, and performed real-time fact-checking against primary sources during critical events. • Stress-tested LLMs with adversarial and real-world data prompts. • Conducted evaluation for hallucinations, temporal confusion, and retrieval failures. • Developed standardized scoring frameworks for model output assessment. • Authored conversation analyses to document and classify model errors.