Clinical AI Specialist and LLM Evaluator
Served as a clinical AI specialist focused on evaluating large language models in medical settings. Developed and applied structured evaluation frameworks to assess LLM performance, safety, and reliability. Conducted systematic red-teaming and rubric-based reviews against medical care standards. • Generated adversarial clinical prompts testing LLM diagnostic capability • Evaluated outputs for hallucinations, unsafe recommendations, and adherence to guidelines • Ranked outputs for accuracy, completeness, and reasoning quality • Identified and documented common model error patterns