Multi-Modal Large Language Model
The scope of the project focused on enhancing the model's visual reasoning capabilities. I performed specific labeling tasks such as: Image Captioning: Writing detailed descriptions of complex scenes. Visual Question Answering (VQA): Creating question-answer pairs based on images. Entity Tagging: Identifying and labeling specific objects or text within images. To ensure quality, we maintained a rigorous feedback loop, targeting a inter-annotator agreement rate above 95% to keep the training data clean and reliable.