LLM Evaluation and Text Generation for Multilingual Chatbots
I worked on a large-scale LLM evaluation and text generation project for a leading chatbot development company. The task involved evaluating the performance of their multilingual LLMs on various customer service scenarios, generating high-quality responses in English and Chinese, and annotating entities using NER classification. The project scope included 10,000+ text samples, with a focus on achieving high accuracy and consistency. I adhered to strict quality measures, including inter-annotator agreement (IAA) checks and regular feedback loops with the project team.