NLP Data Preprocessing/Data Labeling Pipeline (Internship Project at NewtonAi)
Created an automated NLP data cleaning pipeline for over 3,000 customer records to improve machine learning data quality. Applied techniques such as regex normalization, stopword filtering, and emoji/HTML removal. Scripted and standardized diverse unstructured text data for downstream analytics and labeling needs. • Performed large-scale data preprocessing and text normalization. • Cleaned and labeled customer input text for machine learning. • Automated repetitive annotation and cleaning tasks with scripts. • Focused on customer insights and business operations.