Optical Character Recognition System – Document Segmentation/Annotation
I developed an end-to-end optical character recognition pipeline for extracting structured data from documents. The pipeline required document data labeling, including table structure detection and segmentation annotation. This effort resulted in improved model performance and increased information extraction accuracy for clinical and unstructured documents. • Document annotation for OCR segmentation • Table structure labeling using CascadeTabNet and PaddleOCR • Data augmentation for enhanced model generalization • Focus on extracting data from clinical documents