For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
A

Arafat Sani

AI Document Intelligence Pipeline Developer (Extern – Outamation)

USA flagNew York, Usa
$25.00/hrIntermediateOther

Key Skills

Software

Other

Top Subject Matter

Mortgage Document Automation

Top Data Types

DocumentDocument
ImageImage
TextText

Top Task Types

Classification
Segmentation
Object Detection
Question Answering
Text Summarization
Fine Tuning
Data Collection
Computer Programming Coding
Transcription
Function Calling

Freelancer Overview

AI Document Intelligence Pipeline Developer (Extern – Outamation). Brings 1+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Other. Education includes Master of Science, New York Institute Of Technology (2025) and Bachelor of Science, New York Institute Of Technology (2022). AI-training focus includes data types such as Document and labeling workflows including Classification.

IntermediateFrenchEnglish

Labeling Experience

AI Document Intelligence Pipeline Developer (Extern – Outamation)

OtherDocumentClassification
I built an end-to-end AI document intelligence pipeline to process and classify unstructured mortgage PDFs. The project involved extracting and labeling document types, applying targeted extraction, and benchmarking accuracy. The work included layout-aware document classification and semantic extraction with both rule-based and LLM-driven approaches. • Used OCR and PDF parsing tools like Tesseract, PaddleOCR, EasyOCR, PyMuPDF, and pdfplumber for data extraction. • Designed a system that categorized documents and routed them for specialized labeling workflows. • Fine-tuned retrieval and chunking strategies for improved question-answering from labeled document data. • Evaluated open-source LLMs for extraction quality and built a Gradio-based demo interface.

I built an end-to-end AI document intelligence pipeline to process and classify unstructured mortgage PDFs. The project involved extracting and labeling document types, applying targeted extraction, and benchmarking accuracy. The work included layout-aware document classification and semantic extraction with both rule-based and LLM-driven approaches. • Used OCR and PDF parsing tools like Tesseract, PaddleOCR, EasyOCR, PyMuPDF, and pdfplumber for data extraction. • Designed a system that categorized documents and routed them for specialized labeling workflows. • Fine-tuned retrieval and chunking strategies for improved question-answering from labeled document data. • Evaluated open-source LLMs for extraction quality and built a Gradio-based demo interface.

2025 - 2025

Education

N

New York Institute Of Technology

Master of Science, Data Science

Master of Science
2024 - 2025
N

New York Institute Of Technology

Bachelor of Science, Computer Science

Bachelor of Science
2018 - 2022

Work History

B

Beats By Dre

Data Analytics Extern

New York
2025 - 2025
O

Outamation

AI Document Intelligence Extern

New York
2025 - 2025