Nostalgic Tweet Data Annotation and Curation, Graduate Researcher—UW Bothell CBM Lab
Curated and labeled a large-scale dataset of approximately 250K nostalgic English tweets for research on nostalgic behavior on social media. Built and fine-tuned transformer-based models, including RoBERTa, DistilBERT, and ensemble models, for automatic nostalgic tweet detection. Utilized manual and automated methods to verify label accuracy and support NLP model training and evaluation. • Engaged in both manual annotation and automated data validation using LLMs. • Focused on emotion and nostalgia recognition as target labels for text sequence classification. • Used Jupyter, Transformers, and Python libraries for annotation and curation workflow. • Collaborated with research advisors and team to define labeling schema and ensure dataset quality.