LLM response Evaluation for bilingual AI Assistant
Contributed to the development of an English-Korean bilingual language model by evaluating AI-generated responses based on helpfulness, relevance, tone, and accuracy. Tasks included ranking model outputs, rewriting user prompts to increase clarity, and identifying emotional tone in text. Worked with over 15,000 prompt-response pairs while maintaining over 98% QA score. Ensured adherence to detailed task guidelines and adjusted labeling strategy to accommodate model updates. Work contributed directly to fine-tuning conversational AI models.