Autonomous Web Navigation Prompting & Evaluation
Worked on a project focused on training large language models to autonomously browse and interact with real-world websites. Created prompts instructing the AI to complete specific tasks such as finding a product, accessing account settings, or retrieving specific pieces of information across varied website types. Evaluated and annotated the AI’s actions step-by-step, including its reasoning, missteps, corrections, and decision-making logic. Simulated realistic navigation challenges, introduced deliberate errors, and documented how effectively the model recovered. Helped improve model behavior for agentic web-based reasoning by identifying failure points and providing feedback.