LLM Output Evaluation and Prompt Testing (Klaviyo)
Led the implementation of prompt versioning, evaluation frameworks, and hallucination mitigation strategies for deterministic AI outputs in production. Worked on optimizing the quality and reliability of outputs from large language models (LLMs) through structured testing and evaluation. Integrated output quality steps directly into live CRM workflows using custom pipeline deployments. • Designed and executed evaluation frameworks for model output validation • Performed systematic rating and prompt testing of LLM results • Oversaw hallucination detection and mitigation labeling workflows • Mentored teams on responsible AI evaluation and tuning