Senior AI Evaluation & QA Specialist
In this role, I created structured evaluation scenarios to test large language model (LLM) agents in simulating real-world workflows. I established gold-standard outputs and acceptable response ranges to assess model performance and consistency. I also developed and maintained scenario templates using JSON and YAML for enhanced QA coverage. • Led scenario-based QA reviews identifying logical inconsistencies. • Implemented validation frameworks to boost model reliability. • Used tools such as Postman, Jira, and TestRail for workflow and evaluation. • Collaborated with AI developers to improve evaluation frameworks.