Multilingual Sentiment Analysis – XLM-RoBERTa + Darija Project
Fine-tuned XLM-RoBERTa for multilingual sentiment analysis across English, French, Spanish, Arabic, and Moroccan Darija using over 40,000 annotated samples. The data work included gathering, cleaning, annotating, and balancing labeled text for supervised model training. Optimized the dataset for highly accurate sentiment classification on under-resourced languages. • Annotated and reviewed text samples for consistent positive, neutral, and negative sentiment labeling. • Balanced class representation to reduce model bias across multiple languages and dialects. • Managed preprocessing, data cleaning, and validation of the final curated dataset. • Published the sentiment model and documented the labeling schema for reproducibility.