Expert Software Engineer
Worked on a Google LLM Function Calling evaluation project focused on verifying and curating high-quality function-calling training data for production-grade language models. The task involved reviewing structured data samples consisting of a user query, a multi-step solution (sequence of function calls with outputs), and a final natural-language response, ensuring the entire pipeline correctly and completely answered the user’s intent Responsibilities included analyzing user intent, validating the overall solution completeness, and performing granular verification of individual function calls. This covered checking parameter correctness, groundedness (traceability to the user query or prior function outputs), relevance to the task, and identifying unnecessary or missing function calls. I also assessed the final response quality, ensuring it was fully supported by function results, free of hallucinations, and did not introduce unsupported or extraneous information.