LLM Output Evaluator for Physics Problems
Evaluated and validated outputs generated by large language models (LLMs) on physics-related problem-solving tasks. Applied expert knowledge of computational and experimental nuclear physics to assess the correctness, logical coherence, and applicability of generated text. Used iterative error analysis and problem deconstruction to provide detailed feedback on LLM-generated content. • Conducted validation and evaluation of LLM outputs pertaining to physics domains • Applied prompt engineering strategies for model assessment and data collection • Leveraged tools such as GPT, Claude, DeepSeek, Gemini, and Label Studio for labeling activities • Performed error analysis and provided structured ratings to guide further AI training.