Bounding Box
Annotation of an image and drawing bounding boxes around each word, transcribing it, and marking overlaid/scene text; grouping word annotations into lines based on physical position, orientation, and semantic consistency, indicating if the line is overlaid/scene text; grouping lines into meaningful semantic paragraphs, specifying if the paragraph is overlaid/scene text.