PhD Research Scholar – Dataset Curation and Annotation
As a PhD Research Scholar at IIT Roorkee, I built and curated datasets using the Stack Exchange API and web scraping for AI model training. These datasets were utilized in classification and complexity measurement tasks with large language models like BERT, Longformer, GPT-2, and Llama. I was responsible for the end-to-end process of dataset collection and annotation to facilitate high-quality machine learning research. • Designed datasets from raw scraped data targeting specific research objectives. • Annotated and validated text datasets for use in fine-tuning and evaluation tasks. • Implemented and tested complexity measures on labeled data. • Managed large-scale experimental workflows on the PARAM Ganga supercomputer.