AI Trainer — RLHF & Model Evaluation
As an AI Trainer focusing on RLHF and model evaluation, I assessed AI-generated outputs for accuracy, coherence, instruction-following, and safety across diverse domains. I engineered and tested prompts relevant to healthcare, science, and general knowledge, ranking responses to identify and document edge cases. I delivered a comprehensive conversational AI dataset that achieved high annotation quality and informed improvements in annotation schema. • Led the RLHF feedback loop for fine-tuning large language models. • Identified hallucinations and categorized reasoning/safety failures to enhance annotation protocols. • Produced a 120,000-token intent classification, slot filling, and dialogue act labeling dataset. • Achieved inter-annotator agreement of Kappa 0.93, surpassing the set benchmark.