LLM Evaluation and Prompt Engineering for STEM AI Training
Participated in a large-scale AI training and evaluation project focused on enhancing the reasoning, factual accuracy, and instructional quality of LLM outputs. Tasks included designing and refining prompts, evaluating AI-generated responses across STEM disciplines, and applying detailed rubrics to assess factuality, logical coherence, and ethical compliance. Contributed to fine-tuning datasets for supervised and reinforcement learning (RLHF) pipelines, ensuring consistent data quality through multi-stage review and QA processes. Worked collaboratively with international annotation teams, maintaining over 98% accuracy in audit scores and providing feedback that improved model reliability and linguistic fluency.