AI Response Evaluator & Data Annotation Specialist
The primary responsibility was evaluating and comparing AI-generated responses in both Arabic and English across business, healthcare, and financial contexts. Regular quality assessments were performed on model outputs, including prompt pipelines for pharmaceutical sales data and anomaly detection workflows. Bilingual capabilities were utilized to support projects focusing on Arabic AI training and annotation tasks. • Assessed output quality for Claude, GPT-4, and local LLMs. • Labeled responses for accuracy, relevance, and tone. • Supported RLHF-style feedback and prompt engineering workflows. • Automated review and structured reporting using proprietary tools.