Advanced STEM and Linguistics Projects
For these two projects, I help create frontier multi-modal datasets that measure model performance on multi-step reasoning and computation tasks, which can be used for SFT and RLFT across STEM disciplines. Additionally, I also create Humanity's Last Exam (HLE)-style questions designed to benchmark the reasoning abilities of state-of-the-art language models across multiple academic and professional domains. The quality measures are what you might typically expect for project and rubric pairs, such as validity of facts, correctness of any computations, applied principles, theories, etc., and more.