University-Level Mathematics Reasoning Dataset – RLHF & Evaluation Project
Contributed to the development and evaluation of a large-scale university-level mathematics reasoning dataset used for training and fine-tuning large language models (LLMs). Designed challenging prompts across algebra, calculus, probability and statistics, linear algebra, discrete mathematics, and optimization, ensuring coverage of both computational and proof-based reasoning tasks. Authored complete, step-by-step solutions emphasizing logical rigor, formal justification, and mathematical clarity. Developed structured scoring rubrics to assess AI-generated responses based on correctness, completeness, reasoning depth, and adherence to formal proof standards. Performed detailed evaluation and rating of model outputs within RLHF frameworks, identifying logical inconsistencies, computational errors, unsupported assumptions, and gaps in multi-step reasoning. Provided corrected solutions with concise rationales to improve model alignment and reliability.