AI Software Engineer (Data Labeling and LLM Evaluation)
I evaluated large language model (LLM) performance against engineering and quality benchmarks. I led initiatives in data labeling, prompt design, and model output analysis, focusing on enhancing AI quality. I collaborated with quality assurance and data teams to establish edge-case annotation guidelines and improve dataset effectiveness. • Evaluated LLM outputs and provided structured feedback • Designed and labeled prompts for AI behavior tuning • Partnered with QA/data teams to develop robust annotation protocols • Utilized KNN techniques to enhance accuracy and dataset value