RMSI
We provided linguistically diverse, contextually accurate descriptive captions for a large-scale visual dataset. Unlike basic attribute tagging, our team generated structured sentences that describe object relationships, spatial context, and overall scene activity to improve visual understanding in Multi-Modal Large Language Models (LLMs)