LLM-Based YouTube Comment Labeling and Evaluation
Built and operated an AI-powered system to collect and label YouTube user comments for LLM training and evaluation. Raw comments were filtered, cleaned, and categorized into sentiment, intent, and topic labels using prompt-based workflows. I performed human-in-the-loop review to verify label accuracy, detect hallucinations, bias, and low-quality outputs, and corrected model errors to create high-quality training data. The project involved thousands of real user comments and required consistency checks, ambiguity resolution, and quality scoring of AI-generated annotations. This dataset was used to evaluate and improve prompt pipelines and model alignment with real human language.