Senior Research Analyst – LLM Evaluation and Dataset Alignment
As a Senior Research Analyst at Turing, I focused on the evaluation and alignment of Large Language Models (LLMs) driven by STEM, physics, and mathematics. I applied formal mathematical reasoning to develop, curate, and evaluate datasets and prompts aimed at improving LLM reasoning, generalization, and robustness. My work included designing mathematical reasoning benchmarks, systematic analysis of model outputs, and supporting safe, scalable AI systems. • Developed and validated high-difficulty math reasoning tasks for LLM evaluation. • Conducted failure-mode analysis and labeled outputs to identify hallucinations. • Enhanced dataset quality, correctness, and interpretability for LLM training. • Collaborated on scalable, reproducible AI research pipelines.