Virtual Assistant / AI Evaluation Specialist
Contributed to LLM agent evaluation by designing realistic, reusable scenarios and defining gold-standard behaviors, acceptable alternatives, and edge cases. Reviewed, annotated, and refined multi-turn agent outputs to support consistent benchmarking, improving coherence, accuracy, and naturalness in English and Tagalog. Built structured ground-truth datasets by extracting financial data from anonymized documents into validated JSON schemas, flagging ambiguities. Applied strong QA and analytical thinking to identify inconsistencies and collaborate with cross-functional teams to improve evaluation frameworks.