AI Model Evaluator & RLHF Annotator
Performed human feedback evaluation of AI-generated outputs for healthcare and general domains as part of reinforcement learning from human feedback (RLHF) tasks. Ranked, rated, and provided structured corrections of model responses for safety, accuracy, and appropriateness. Wrote high-quality prompts and responses to improve AI conversational abilities and diagnostic reasoning. • Evaluated AI model outputs for correctness and factual accuracy in clinical and non-clinical scenarios • Identified diagnostic errors and logical inconsistencies in model-generated answers • Rated multi-step reasoning and instructed models in Arabic, English, Spanish, and French • Participated in prompt-response writing and structured evaluations for LLM fine-tuning projects