Bulba Languages
The Bulba Languages project focused on evaluating and improving the performance of multilingual large language models (LLMs) by assessing AI-generated responses across various languages and cultural contexts. My role involved tasks such as rating the quality, factual accuracy, tone, and cultural appropriateness of model outputs. This required following strict annotation guidelines and quality control procedures to ensure the training data aligned with ethical standards and real-world linguistic nuances. The project was large-scale and collaborative, with contributors across different regions working to fine-tune the model’s behavior. I was responsible for identifying and flagging inappropriate or biased responses, ensuring that the AI system responded accurately and respectfully across different scenarios. Maintaining high consistency and quality was critical, as the outputs directly influenced model performance in sensitive use cases like customer support, education, and public-facin