AI-generated Text Detection for Indian Languages - Data Labeling & AI Training
I contributed to the development of an AI-authorship detection model for Indian language text by fine-tuning multilingual transformer models. My responsibilities included extracting stylometric features and preparing text datasets for model training and contextual evaluation. I designed and applied an ensemble classifier approach to improve overall detection accuracy. • Fine-tuned IndicBERT and MuRIL models using prepared Indian language text data. • Labeled stylometric and probability-based features for supervised model learning. • Evaluated and validated contextual understanding of the AI models. • Utilized Python and HuggingFace Transformers for model training and dataset preparation.