AI Quality Engineer
As an AI Quality Engineer, I developed and evaluated prompts to assess large language model (LLM) performance on coding tasks. My work involved analyzing outputs for accuracy, uncovering recurring failure patterns, and providing iterative feedback to enhance reasoning. The main objective was to improve LLM consistency and effectiveness on complex coding challenges. • Designed and executed LLM evaluation tasks specific to code generation and reasoning. • Identified and documented recurring patterns of model failure. • Refined and iterated prompts to address weaknesses in LLM reasoning. • Collaborated with engineering teams to integrate evaluation protocols.