AI Research Intern – TextBandit Evaluation Project
I built an evaluation framework to analyze AI model benchmark results in the Algoverse TextBandit project. My work involved defining and polishing probabilistic reasoning tasks for assessing model performance. I used PyTorch and NumPy to cross-reference large-scale datasets for evaluation. • Evaluated and rated text-based outputs against benchmark datasets • Defined task criteria for probabilistic reasoning in AI models • Used Python-based data analysis to inspect AI model behavior • Implemented scoring pipelines with PyTorch and NumPy