Prompt Evaluation & Response Ranking for Large Language Model
Worked as part of a global team to evaluate and improve responses generated by large language models. Tasks involved comparing multiple AI-generated answers, ranking them based on helpfulness, coherence, and accuracy, and providing written feedback to guide model tuning. Also created and labeled user prompts to improve real-world instruction-following capability.