LLM Alignment: Math Reasoning Dataset (Personal Project)
I designed a high-difficulty math problem dataset focused on combinatorics and number theory to evaluate LLMs. I created and curated over 500 problems and implemented a quality control workflow to verify solution factuality and mathematical soundness. I systematically flagged and corrected AI-generated content that contained hallucinated or incorrect theorems. • Compiled and structured datasets for LLM benchmarking. • Developed verification flow for math solutions. • Focused on rigor in combinatorics and number theory domains. • Applied judgment to accept or reject AI solutions based on truthfulness.