Lead: Agent Simulation Environments, Training Data & Benchmarks for LLM Agents
At Turing I lead a team building large-scale simulation and scenario environments that produce training and evaluation data for LLM agents, along with the benchmarks used to score agent performance. The work spans environment design, scenario authoring, scripted ground-truth generation, automated grading, and end-to-end evaluation infrastructure. Key contributions include 60+ proprietary simulated API services (Gmail, Slack, WhatsApp, Google Maps, etc.) built from scratch to give agents grounded, multi-step tool-use scenarios; Gemini Gym's Mutation Engine, a config-driven generator producing 1M+ tool variations with auto-generated function-calling schemas for diverse training coverage; on-device RL frameworks integrating 80+ Android tools (Kotlin) and 30+ iOS tools (Swift) built from the ground up; and benchmarking infrastructure with comprehensive test coverage that scores cloud AI agents on real-world tasks sourced from live open-source GitHub repositories.