Adversarial Math Prompting and Reasoning Correction
Scope Designed adversarial math prompts to surface LLM failures in algebra, calculus, probability, and word problems. - Delivered gold-standard solutions, rationales, rubrics, and EN/ES localized versions. Tasks Crafted problem variants to trigger errors (ambiguity, multi-step slips, unit/symbol drift). Ran model tests; annotated failure steps; authored corrected derivations and final answers. Tagged items with domain, difficulty, skills, error category, and hint tiers; performed EN↔ES localization. Project Size 250–400 items; 2–3 adversarial variants each (≈600–900 prompts); 1–2 revision cycles per item. Numbers adjustable to your exact counts. Quality Measures Double-annotation and rubric checks (target \(\kappa \ge 0.8\)). Peer-reviewed gold solutions; unit/notation verification. Bilingual QA with glossary-enforced terminology; versioned changes vali