Data Annotator
The project focused on evaluating AI agents as they completed complex, real-world tasks across various websites. Responsibilities included annotating agent behavior, labeling actions, assessing task success or failure, and identifying reasoning or execution errors based on established guidelines. The tasks often involved multi-step workflows requiring close attention to detail and contextual understanding. In addition to annotation, I served as a reviewer for other annotators’ work—conducting quality checks, providing constructive feedback, and ensuring guideline consistency across the team. The project operated at scale, with a large volume of data and contributors. High quality was maintained through regular audits, calibration exercises, and accuracy benchmarks, where my contributions directly supported both data integrity and team performance.