For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
J
John Nightingale

John Nightingale

Environmental Data Scientist | AI Training Data & Scientific Annotation Specialist

United Kingdom flagLeeds, United Kingdom
$50.00/hrEntry LevelOther

Key Skills

Software

Other

Top Subject Matter

Legal Services & Contract Review
Regulatory Compliance & Risk Analysis
Legal Research & Document Analysis

Top Data Types

TextText
DocumentDocument
Computer Code ProgrammingComputer Code Programming

Top Task Types

Data CollectionData Collection
Text GenerationText Generation
Computer Programming/CodingComputer Programming/Coding

Freelancer Overview

I bring a strong combination of environmental data science, analytical chemistry, and computational modelling experience that translates directly into high-quality data labeling and AI training workflows. In my research roles, I’ve worked extensively with large, complex datasets generated from high-resolution mass spectrometry (HRMS), where careful annotation, classification, and confidence scoring (e.g., Schymanski levels) are critical. This involves structuring noisy, real-world environmental data into reliable, machine-readable formats—effectively a form of expert-level data labeling. I’ve built Python pipelines (using pandas, NumPy, and scikit-learn) to clean, standardise, and integrate multi-source datasets (e.g., wastewater, soil, crop uptake), ensuring consistency and traceability—key requirements for training robust AI models. What sets me apart is the combination of domain expertise and technical implementation. I don’t just label or process data—I understand the underlying systems, biases, and uncertainties, whether that’s chemical behaviour, environmental variability, or measurement limitations. I’ve developed and validated predictive frameworks linking observed data (MECs) with model outputs (PECs), which requires careful feature engineering, validation logic, and edge-case handling—skills directly applicable to training and evaluating AI systems. My experience working in both academic and regulated industry environments (including GLP-aligned workflows) also means I prioritise accuracy, reproducibility, and auditability in all data handling processes.

Entry LevelEnglish

Labeling Experience

Environmental HRMS Data Annotation & AI-Ready Dataset Development

DocumentData Collection
Led the annotation, structuring, and validation of large-scale environmental datasets generated from high-resolution mass spectrometry (HRMS) across soil, wastewater, and agricultural systems. The project involved transforming raw, noisy analytical outputs into AI-ready datasets through systematic labeling of chemical entities, classification of compounds, and assignment of confidence levels (e.g., Schymanski identification framework). Developed Python-based workflows (pandas, NumPy, scikit-learn) to standardise multi-source data, remove inconsistencies, and ensure reproducibility. Implemented quality control pipelines including tolerance thresholds (e.g., mass accuracy filtering), duplicate handling, and cross-dataset validation. Annotated over 500+ contaminants across multiple environmental matrices, integrating metadata such as physicochemical properties and detection frequencies. A key component of the work involved aligning observed environmental concentrations (MECs) with predicted model outputs (PECs), requiring careful feature engineering, edge-case handling, and validation logic—directly supporting downstream machine learning applications. All workflows were designed to meet high standards of traceability and auditability, consistent with GLP-aligned environments.

Led the annotation, structuring, and validation of large-scale environmental datasets generated from high-resolution mass spectrometry (HRMS) across soil, wastewater, and agricultural systems. The project involved transforming raw, noisy analytical outputs into AI-ready datasets through systematic labeling of chemical entities, classification of compounds, and assignment of confidence levels (e.g., Schymanski identification framework). Developed Python-based workflows (pandas, NumPy, scikit-learn) to standardise multi-source data, remove inconsistencies, and ensure reproducibility. Implemented quality control pipelines including tolerance thresholds (e.g., mass accuracy filtering), duplicate handling, and cross-dataset validation. Annotated over 500+ contaminants across multiple environmental matrices, integrating metadata such as physicochemical properties and detection frequencies. A key component of the work involved aligning observed environmental concentrations (MECs) with predicted model outputs (PECs), requiring careful feature engineering, edge-case handling, and validation logic—directly supporting downstream machine learning applications. All workflows were designed to meet high standards of traceability and auditability, consistent with GLP-aligned environments.

2021 - Present

AI Model Evaluation & Training Data Quality Assurance (Outlier)

TextFine Tuning
Contributed to AI model evaluation and training data refinement through structured assessment of model outputs, focusing on reasoning quality, factual accuracy, and alignment with task objectives. Tasks included reviewing and scoring AI-generated responses, identifying failure modes, and providing corrective feedback to improve model performance. Worked across complex, multi-domain prompts requiring critical thinking, consistency checks, and edge-case handling. Applied structured evaluation frameworks to ensure high-quality training data, supporting iterative model improvement and robustness.

Contributed to AI model evaluation and training data refinement through structured assessment of model outputs, focusing on reasoning quality, factual accuracy, and alignment with task objectives. Tasks included reviewing and scoring AI-generated responses, identifying failure modes, and providing corrective feedback to improve model performance. Worked across complex, multi-domain prompts requiring critical thinking, consistency checks, and edge-case handling. Applied structured evaluation frameworks to ensure high-quality training data, supporting iterative model improvement and robustness.

2026 - Present

Education

U

University of Leeds and Fera Science Ltd

Doctor of Philosophy, Geography

Doctor of Philosophy
2019 - 2021
K

Keele University and University of Edinburgh

Master of Science, Geoscience Research

Master of Science
2018 - 2019

Work History

U

University of Leeds

Scientific Project Lead

Leeds
2022 - Present
F

Fera Science Ltd

Study Director – Environmental Fate

York
2021 - 2022