Text Data Labeling/NLP Entity Annotation (Scigenix, LLM Project)
As Data Scientist at Scigenix (Pty) Ltd, I developed NLP solutions using Large Language Models to automate document analysis. I prepared and labeled text data to extract entities, classifications, and actionable insights for automation and review reduction. The labeling involved NER tagging and preparation of structured datasets for model fine-tuning. • Labeled and annotated textual entities using Python-based NLP tools. • Conducted dataset preparation for large language model fine-tuning. • Used internal scripts and spaCy for annotation and verification tasks. • Ensured quality and scalability for enterprise document NLP pipelines.