LLM Response Ranking and Prompt Evaluation for Conversational AI
This project focused on the human evaluation and tuning of large language model (LLM) outputs using Reinforcement Learning from Human Feedback (RLHF). My role involved ranking AI-generated responses based on accuracy, coherence, tone, and relevance to user prompts. I also performed prompt+response crafting for supervised fine-tuning (SFT) and flagged outputs that were biased, toxic, or misleading. The data was primarily in English, with some exposure to multilingual prompts. Over the course of the project, I annotated and evaluated over 10,000 data points, following strict quality control protocols and review guidelines set by the model alignment team. Tools used included Appen, Scale AI, and proprietary platforms developed by DeepMind.