Senior Full-Stack Engineer - Real-time News Content Labeling
Built and managed real-time classification API for ingesting and labeling news content from RSS feeds. Developed scalable text preprocessing pipelines for cleaning, tokenization, stopword removal, and lemmatization to prepare data for model training. Achieved high accuracy on benchmark datasets through meticulous data preparation and labeling processes. • Automated the labeling pipeline with Python and Flask for efficient handling of high-volume textual data. • Used spaCy and NLTK for advanced text preprocessing and annotation tasks. • Focused on content scoring and recommendation through manual and semi-automated data annotation techniques. • Ensured quality control and reproducibility of labeling standards across production environments.