AI Language Model Evaluator
As an AI Language Model Evaluator at Turing, I wrote and validated Q&A pairs to support large language model training and evaluation. My work included evaluating AI-generated answers for correctness, logical reasoning, and instruction-following through guideline-based annotation. I also reviewed outputs in coding, SQL, and reasoning tasks, documenting errors and inconsistencies. • Created and reviewed structured question-answer datasets. • Rated and ranked AI-generated responses for accuracy and logic. • Performed detailed analysis of model hallucinations and guideline adherence. • Used structured annotation workflows for reinforcement learning dataset improvement.