Researcher – Multi-modal Video Summarization Framework (ICITSM 2025)
Developed and validated a multi-modal AI video summarization framework combining frame-level analysis, OpenAI Whisper transcription, semantic embeddings, and transformer-based learning. Labeled and evaluated video data for summarization using F1 scores as key performance metrics on SumMe and TVSum benchmark datasets. Designed and tuned annotation pipelines for embedding-based context retrieval and model performance evaluation. • Combined video, audio, and text modalities for robust summarization. • Leveraged OpenAI Whisper and LangChain for transcription and context capture. • Conducted systematic annotation and benchmarking on multiple datasets. • Published results and methodologies in peer-reviewed venues.