Medical Image Classification & Data Annotation for Malaria Detection
As part of my MSc in Data Science, I completed a full end‑to‑end data labelling and image‑classification project focused on detecting malaria parasites in microscopic blood‑smear images. This project involved reviewing, annotating, pre-processing, and validating a dataset of 27,558 medical images, split evenly between parasitised and uninfected cells. This work demonstrates my ability to apply strict labelling criteria, follow detailed annotation guidelines, and ensure high‑quality, reliable datasets for machine‑learning workflows. My Role & Responsibilities Reviewed and validated thousands of cell images, ensuring each was correctly categorised based on visible parasitic features. Pre-processed and normalised images (resizing, cleaning, and standardising dimensions) to ensure consistency across the dataset. Applied structured labelling rules to maintain accuracy across two classes: parasitised vs uninfected. Identified unclear, low‑quality, or ambiguous images, flagging them for exclusion or manual review. Evaluated model performance using precision, recall, F1‑score, and accuracy to assess annotation quality. Documented all steps clearly, ensuring reproducibility and transparency. Technical Summary I built and compared two deep‑learning models: Custom CNN Model Achieved 93% accuracy, later fine‑tuned to 96% accuracy. As noted in my report: “The classification report for the first model obtained an accuracy of 93%, precision of 96%, recall of 90%, and an F1 score of 93%.” Transfer Learning Model (EfficientNetV2B0) Used as a feature extractor; evaluated for comparison. Initial accuracy: 61%. This comparison reinforced the importance of high‑quality, consistent data, as the CNN performed significantly better when trained on the carefully curated dataset. Why This Project Matters for Annotation Work This project demonstrates my ability to: Work with large datasets requiring precise labelling Follow complex instructions and annotation rules Apply quality‑control checks to ensure data integrity Understand how labelling quality directly affects model performance Communicate findings clearly and professionally It reflects the same skills required for AI training, content evaluation, and data-labelling roles, especially those involving image classification, factual verification, and structured annotation workflows.