Document Labeling and Data Extraction for PDF Structuring
I engineered a Python solution to transform unstructured PDFs into structured Excel outputs with complete data capture. My responsibilities included designing an intelligent regex and contextual analysis engine for extracting key-value pairs. I managed the end-to-end pipeline including document processing, key data extraction, and QA verification. • Labeled and classified key data fields in unstructured PDF documents. • Developed and refined extraction logic using contextual understanding. • Built and validated training data for document structuring AI. • Used a Flask web app for efficient annotation and data quality control.