Text Categorization for Multilingual NLP Models
This project involved labeling a multilingual dataset for the development of an advanced natural language processing (NLP) model. The focus was on annotating English and Spanish text for intent classification, sentiment analysis, and named entity recognition (NER). I performed text categorization and entity labeling tasks, marking key phrases, names, locations, and other entities that were critical for training a language model to understand and generate responses in both languages. Additionally, I was involved in ensuring accurate translations and localizations to support chatbot functionality in both English and Spanish. The dataset consisted of thousands of sentences, ensuring diversity in conversational patterns to enhance the model’s accuracy. Quality measures such as double annotation, verification, and consistency checks were implemented to guarantee high-quality labeled data.