Multilingual Image Text Annotation for AI Translation Systems
This project focused on building a high-quality labelled dataset to support machine learning models for multilingual image translation. The objective was to accurately identify, annotate, and classify textual elements within images, including fonts, colours, language types, and layout structures. The dataset included images containing multiple languages, varied font styles, and diverse background conditions. Each image was manually annotated to mark text boundaries, character types, font categories, colour attributes, and language labels. Special attention was given to ensuring consistency, accuracy, and adherence to annotation guidelines. Quality assurance processes were implemented to reduce noise and bias, including double-checking annotations, resolving ambiguities, and maintaining a clear version control system. This ensured the dataset was suitable for training and evaluating OCR systems, font recognition models, and multilingual translation pipelines.