OSWorld Multimodal Agents
Trained multimodal AI agents within the OSWorld framework to execute complex, open-ended computer tasks autonomously. The scope of the project involved assessing the model's ability to natively interact with operating systems and desktop applications. Specific tasks included evaluating GUI navigation logic, analyzing action sequences for accuracy, and providing human feedback on the agent's desktop automation strategies. Ensured high-quality training data by rigorously scoring task completion success and correcting spatial reasoning or tool-use errors within the UI environment.