AI Training & Evaluation Analyst
Support large language model training and evaluation through analytical review, prompt development, and independent research. Assess LLM outputs across specialized scientific domains for accuracy, clarity, reasoning depth, and response quality to help measure model performance. Develop domain-specific prompts and conduct research to inform evaluation criteria, testing approaches, and quality improvement efforts. Work includes rating and ranking model responses, surfacing systematic failure patterns, and producing written quality assessments that directly inform training data decisions. Quality measures adhered to include consistency with established rubrics, defensibility of reasoning in written evaluations, and alignment with project-specific standards for accuracy and depth.