Expert Judge, Datavio
As an Expert Judge at Datavio, I critically evaluate large language model (LLM) responses for logical consistency, factual accuracy, and safety. I develop and refine prompts using Chain-of-Thought (CoT) techniques to enhance model reasoning on complex technical queries. Additionally, I execute RLHF (Reinforcement Learning from Human Feedback) tasks to ensure AI outputs align closely with professional human standards. • Conduct comparative analysis and rating of AI-generated text outputs • Assess risk factors and potential biases in LLM responses • Design and test evaluation frameworks for model performance • Collaborate with prompt engineers to continuously improve labeling protocols