Cross-Application AI Knowledge Retrieval Testing
Contributed to an integrated workspace AI evaluation project focused on testing cross-application information retrieval and grounded response generation. Designed structured prompts to query connected data sources, evaluated outputs for factual accuracy, relevance, reasoning quality, and proper source grounding, and labeled responses for hallucinations and instruction adherence. Maintained strict compliance with detailed annotation guidelines and quality control standards, where continued project access and increased compensation were performance-based and contingent on accuracy. Contributed approximately 360 hours to the project series, consistently meeting precision and review benchmarks.