LLM Output Evaluator and Clinical AI Safety Reviewer (Founder & AI Systems Architect – OmniHealthAI)
I led structured evaluations of large language model (LLM) outputs in healthcare contexts with a focus on clinical reasoning and safety. My work included benchmarking, fact-checking, and risk-based review of AI-generated clinical content using authoritative medical sources. I identified and corrected hallucinated or unsafe medical responses from AI systems. • Designed and implemented standardized evaluation rubrics for clinical and patient communication tasks • Conducted detailed clinical reasoning and triage decision-making audits of AI-generated text • Utilized rubric-based scoring methodologies to rate model output safety and factual alignment • Integrated authoritative sources (CDC, WHO, NICE, peer-reviewed literature) for evidence-based fact-checking.