Large Language Models (LLM) | AI Training
Led large-scale evaluation and quality control of AI-generated Q&A datasets used in LLM fine-tuning and reinforcement learning workflows. Conducted rubric-based assessments focusing on reasoning accuracy, coherence, safety, and policy compliance across STEM and humanities domains. Performed structured pairwise comparisons and systematic output validation to identify hallucinations, factual inconsistencies, and logical gaps.