Spanish–English Multilingual Text Annotation for LLM Training
Worked on a bilingual dataset used to fine-tune a multilingual language model. Labeled Spanish and English text snippets for named entities, tone, intent, and topical classification. Later transitioned to LLM evaluation tasks, rubric scoring, preference ranking, and harmful-content detection, helping calibrate early instruction-following behaviors. Flagged inconsistencies in dialectal Spanish examples (MX vs. ES), which informed dataset cleanup.