AI Research Intern—NLP Data Preparation and Summarization
During my AI Research Internship at Suvidha Foundation, I compiled and processed over 1,100 news articles to support NLP summarization research using Python. I handled the annotation, cleaning, and preparation of textual data for training AI models on the CNN/DailyMail datasets. My work included developing preprocessing pipelines using Pandas and NumPy, and rigorously evaluating model outputs for factual accuracy and logical consistency. • Annotated and structured large volumes of text data specifically for NLP applications. • Validated LLM outputs, identifying hallucinations and logical errors in text summaries. • Executed data cleaning and transformation using Python libraries (Pandas). • Documented data processes to ensure reproducibility for research reporting.