Media Monitoring
LM Evaluation Project: Accuracy Rating for Multimedia Summaries This project focused on human-in-the-loop data labeling to rigorously evaluate the accuracy and quality of summaries generated by a Large Language Model (LLM) from short audio and video clips. The goal was to generate a "gold-standard" human-rated dataset for the evaluation and fine-tuning of the LLM, and benchmark its performance against competing models.