AI Engineer – LLM Prompt Engineering & Evaluation
In this role, I created, evaluated, and curated prompts and expected responses for large language model pipelines within HR automation and payroll processing. My work included crafting high-quality prompt datasets, reviewing AI-generated outputs, and implementing improvements based on auditor and stakeholder feedback. I contributed substantially to the evaluation and continuous training of LLM models powering audit case processing and root-cause analysis. • Authored and tested prompts for RAG-enabled audit cases using structured and legal data inputs. • Evaluated and annotated model responses to ensure compliance, accuracy, and operational efficiency. • Partnered with product and compliance teams to refine model outputs and keep labeling tasks aligned with evolving business needs. • Deployed and leveraged AWS Bedrock and OpenAI for prompt engineering and model evaluation at scale.