LLM Evaluation and QA for Mathematics and Scientific Text
Conducted large-scale LLM QA and data annotation for mathematics and scientific reasoning tasks. Responsibilities included evaluating AI-generated answers, designing and applying scoring rubrics, and performing entity extraction, classification, and QA alignment checks. Annotated thousands of text samples across problem-solving, step-by-step reasoning, and scientific explanations. Ensured quality and consistency through double-review processes, peer audits, and reference validation. Provided prompt and response refinements to improve model accuracy and learning outcomes.