Multilingual Chatbot Evaluation for LLM Alignment (Spanish & Swahili)
Evaluated and rated multilingual chatbot conversations to improve the performance and cultural sensitivity of a large language model (LLM). Tasks included scoring AI-generated responses in Spanish and Swahili based on clarity, relevance, helpfulness, and tone. Also corrected and translated prompts for accuracy and cultural nuance. Additional tasks involved named entity recognition, emotion tagging, and refining prompt/response pairs for supervised fine-tuning (SFT). Maintained consistent quality benchmarks (>95% QA pass rate) and used internal dashboards to log insights. Contributed over 1,200 high-quality annotations over a 6-week period.