AI Evaluation Specialist
As an AI Evaluation Specialist, I systematically evaluated large language model (LLM) responses for clarity, reasoning, and factual correctness. My work involved fact-checking model-generated content, providing detailed feedback, and collaborating with AI development teams to improve model behaviors. Using established evaluation rubrics and taxonomies, I ensured rigorous and consistent assessment standards were met. • Assessed LLM output quality across a broad array of topics and questions. • Conducted detailed fact-checking using academic and reputable sources. • Delivered structured, actionable feedback to model engineers for iterative improvement. • Ensured reproducibility and accuracy of AI evaluation processes through benchmark adherence.