Data Scientist – Medical report text classification (AI training and evaluation)
Implemented an AI-based text classification system for medical reports in an automation setting. Performed comprehensive data preprocessing, including text cleaning, normalization, stopwords removal, and tokenization. Applied and fine-tuned both classical machine learning pipelines and a domain-specific BERT model for optimal classification performance. • Evaluated multiple classification strategies for service-specific keyword detection. • Compared and recommended best approaches using accuracy, F1-score, and confusion matrix results. • Optimized the selected approach for computational efficiency and precision. • Used Python, Scikit-Learn, NLTK, and Hugging Face Transformers within healthcare domain.