Evaluation of LLM Outputs in English & Saudi Arabic
I worked on projects focused on evaluating and improving chatbot responses, with a strong emphasis on localization and safety. I made sure AI-generated outputs were accurate, helpful, and also colloquially and culturally appropriate for Saudi Arabic speakers, whilst also assessing whether their responses followed instructions, maintained factual correctness, and aligned with local dialects. I also contributed to safety-critical tasks by evaluating how the model handled potentially harmful or unsafe requests, enhancing the model's ability to provide safe and trustworthy interactions in Saudi Arabic.