LLM Evaluation & Mathematical Reasoning Annotator
Worked on AI training projects requiring advanced evaluation of large language models with a focus on mathematical and statistical accuracy. Created complex prompts involving probability, finance, and data analysis; reviewed AI outputs for logical correctness; and rated model responses against gold standards. Ensured outputs adhered to probabilistic reasoning and statistical rigor while maintaining annotation consistency across thousands of text samples. Contributed to improving model reasoning benchmarks by identifying systematic errors and recommending guideline refinements.