Contract Economist – Data Labeling and AI Evaluation
Authored and validated original, graduate-level problems to assess and enhance large language model (LLM) fluency and reasoning capabilities. Evaluated AI-generated responses to images, videos, audio, and domain-specific prompts using detailed rubrics for high accuracy. Designed prompts and conducted in-depth AI model evaluations to contribute to proprietary datasets for benchmarking and fine-tuning. • Created and applied rubrics for rigorous AI evaluation criteria • Performed granular evaluation on audio, visual, and text-based model outputs • Constructed challenging test items for LLM depth and academic quality • Wrote detailed justifications to improve model interpretability and correctness