AI Training & Evaluation Specialist (RLHF and Data Annotation)
Designed structured evaluation rubrics to assess large language model (LLM) generated responses for logical coherence, factual accuracy, instruction-following, and quality. Composed and tested diverse prompt sets to evaluate model reasoning, edge case performance, and failure modes. Conducted detailed assessment of AI outputs for tone, clarity, helpfulness, and safety to guide performance improvement. Contributed to AI training and evaluation using DataAnnotation.tech platform. • Created extensive evaluation rubrics and prompt sets tailored for reinforcement learning from human feedback (RLHF). • Evaluated and rated multi-turn AI conversations for reasoning, adherence to instructions, and response safety. • Provided structured and actionable feedback for model improvement across diverse subject areas. • Demonstrated ability to support quality-driven RLHF and generalist AI workflows in real-world assessment settings.