AI Engineering Lead (LLM Output Evaluation & RAG Testing)
Led the design and implementation of evaluation frameworks for LLM outputs to ensure factual accuracy and non-trivial reasoning. Utilized prompt optimization and rigorous testing of retrieval-augmented generation pipelines. Established systems for continuous assessment of AI-generated text for specialized industry use cases. • Developed and maintained prompt evaluation guidelines for LLM response assessment. • Managed integration of vector database indexing for improved information retrieval. • Conducted manual and automated evaluations of AI responses using internal tooling. • Coordinated with AI engineers to refine prompt engineering and retrieval techniques.