Evaluator of AI-Generated Clinical Content and LLM Outputs
Evaluated and critiqued outputs from large language models used in healthcare content creation and research. Assessed AI-generated clinical text and queries for accuracy, safety, and reasoning quality. Provided structured written feedback to improve model performance, referencing established medical standards. • Experience identifying and rating subtle clinical inaccuracies in AI responses • Evaluated clinical reasoning of LLM outputs against structured competency frameworks • Familiarity with LLMs such as Claude for medical information tasks • Role included designing evaluation rubrics and scoring AI-generated content