Lead AI & Automation Engineer — Lithium
Responsible for evaluating the quality and factual consistency of outputs from large language models and NLP-driven applications. Developed QA rubrics and structured guidelines for annotation teams to standardize AI output assessment. Diagnosed model failure modes and supported the improvement of training data quality for conversational AI systems. • Detailed evaluation of LLM and NLP outputs, focusing on reasoning gaps and hallucination detection. • Authored and maintained model evaluation documentation and quality assurance standards. • Collaborated with annotation teams to calibrate evaluation processes and criteria. • Utilized internal proprietary Python tools to automate assessment and reporting workflows.