Generalist AI Trainer
Led evaluation and quality assurance of LLM text generation across advanced mathematics, data science, machine learning, and programming tasks. Architected automated validation and regression frameworks for model workflows using Playwright, Jest, and Vitest, ensuring production-grade reliability. Executed adversarial, multilingual, and stress testing to assess model robustness, safety, and reasoning consistency. Designed scalable annotation and model-evaluation pipelines, reducing regression defects by 40% across iterative releases. Defined and enforced high-precision annotation and QA standards, improving inter-annotator agreement by 15%. Partnered with product, engineering, and research teams to surface complex edge cases early and improve model readiness for deployment.