computer-use-agent-evals
Perform side-by-side (SxS) evaluations on GHD, comparing outputs and assessing quality, accuracy, and relevance. Conduct fact-checking to validate information and identify inconsistencies or inaccuracies. Provide detailed, well-structured justifications to support evaluation decisions. Follow established evaluation guidelines and quality standards consistently. Use browser-based tools and extensions to support analysis and review tasks. Execute simple command-line instructions when required to support evaluation workflows. Document findings clearly and communicate insights to stakeholders when needed.