AI Document Intelligence Pipeline Developer (Extern – Outamation)
I built an end-to-end AI document intelligence pipeline to process and classify unstructured mortgage PDFs. The project involved extracting and labeling document types, applying targeted extraction, and benchmarking accuracy. The work included layout-aware document classification and semantic extraction with both rule-based and LLM-driven approaches. • Used OCR and PDF parsing tools like Tesseract, PaddleOCR, EasyOCR, PyMuPDF, and pdfplumber for data extraction. • Designed a system that categorized documents and routed them for specialized labeling workflows. • Fine-tuned retrieval and chunking strategies for improved question-answering from labeled document data. • Evaluated open-source LLMs for extraction quality and built a Gradio-based demo interface.