Document and Email Entity Extraction and Labeling Engineer
Built an end-to-end automated data extraction engine for document and email datasets, leveraging Azure Form Recognizer, AWS Textract, and custom NLP for entity recognition and field extraction. Labeled and validated structured and semi-structured documents to create ground truth for downstream AI models. Processed and managed over 2 million document samples per month for continuous model improvement and production inference accuracy tracking. • Engineered data annotation and extraction workflows for scale. • Created labeled datasets for invoice, contract, and email fields. • Used Azure Form Recognizer and AWS Textract for semi-automatic annotation. • Performed manual entity correction and QA for model validation.