AI Data Labeling & Model Evaluation Contributor (Remote)
Reviewed image-based and text-based QA datasets for accuracy and consistency and verified AI-generated outputs against structured labeling criteria. Produced clear, standalone corrected answers when model responses were inaccurate and documented concise rationales for non-answerable items. Maintained a 7–8 minute per-item review pace while meeting strict quality benchmarks. • Applied adversarial evaluation techniques to identify model reasoning weaknesses • Ensured data integrity and applied rigorous quality control standards • Evaluated and corrected mislabeled outputs • Enforced policy and guideline compliance