Senior Machine Learning Scientist (AI Evaluation & Reasoning Systems)
I led the design and implementation of AI evaluation frameworks to assess the reasoning quality of LLM-generated outputs in STEM domains. My work involved rigorous error analysis, identification of logical inconsistencies and hallucinations, and the development of standardized scoring methodologies. This process was crucial for enhancing both model performance and data quality through targeted feedback and iterative evaluation. • Evaluated logical reasoning and content quality in LLM outputs • Developed and applied rubrics for standardized assessment • Produced detailed reports for model improvement • Collaborated with teams to align annotation guidelines