Automated Data Labeling of Financial Securities Web Pages
Designed and deployed an automated data labeling system that classified and annotated over 200,000 investment fund pages. This solution leveraged advanced NLP techniques—including GPT‑4 API integration and fine‑tuned BERT models combined with regex parsing—to address critical third‑party data gaps. The project resulted in up to 10% additional data features for subscription‑tier users and improved the overall accuracy and timeliness of financial data feeds. I also led a cross‑functional remote team to optimize data pipelines and ensure scalable, rapid prototyping under tight deadlines.