AI Evaluation & Prompt Architect (Future Digital Works)
As Lead AI professional, designed and evaluated high-complexity prompts and conducted LLM evaluation projects across multiple domains. Executed RLHF annotation, hallucination detection, and benchmarking for frontier models in English and Modern Standard Arabic. Delivered human-in-the-loop audits and rubric-based evaluations for expert-level reasoning and alignment tasks. • Developed open-ended prompt frameworks and complex question design. • Performed response quality scoring, preference ranking, and rubric-based annotation. • Carried out dataset refinement for bilingual NLP applications. • Utilized Remotasks, Scale AI, Mercor, and proprietary prompt orchestration tools.