AI Agent Testing and Evaluation Intern
I participated in the testing and validation of AI Agents powered by Large Language Models (LLMs) to enable autonomous decision-making and contextual understanding across various workflows. My responsibilities included evaluating AI agent performance in tasks such as expense management, customer service, and data processing. I helped ensure the accuracy and reliability of model-generated outputs by following specific evaluation protocols. • Assessed model understanding and task execution for workflow automation. • Conducted end-to-end evaluations of contextual responses produced by LLM-powered agents. • Reported issues and contributed to refining AI agent decision logic. • Collaborated closely with engineering and QA teams to improve AI agent robustness.