Data Annotator – Computer Use Agent (CUA) Evaluation
• Execute structured end-to-end evaluations of AI Computer Use Agent workflows across desktop applications (e.g., QGIS, OnlyOffice, Calibre, Notepad++, system utilities), ensuring strict adherence to Standard Operating Procedures (SOPs). • Validate task execution accuracy by identifying hard blockers, tool limitations, system restrictions, and deviations from prompt specifications in controlled test environments. • Document execution paths, edge cases, and failure points with clear, reproducible reporting to improve model reliability, procedural compliance, and real-world usability. • Enforce compliance standards by confirming that outputs meet exact prompt requirements without unauthorized substitutions, workarounds, or assumption-based deviations.