For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Mordecai Machazire

Mordecai Machazire

AI Safety Researcher - Protocol-Governed Systems

USA flag
Buffalo, Usa
$75.00/hrExpertOtherMercorGoogle Cloud Vertex AI

Key Skills

Software

Other
MercorMercor
Google Cloud Vertex AIGoogle Cloud Vertex AI

Top Subject Matter

No subject matter listed

Top Data Types

AudioAudio
Computer Code ProgrammingComputer Code Programming
DocumentDocument
ImageImage
Medical DicomMedical Dicom
TextText
VideoVideo

Top Label Types

Action Recognition
Audio Recording
Classification
Evaluation Rating
Object Detection
Prompt Response Writing SFT
Question Answering
Text Generation
Text Summarization
Transcription

Freelancer Overview

I specialize in designing and implementing protocol-driven evaluation systems that ensure AI models are trained, labeled, and audited with maximum reliability and transparency. My experience spans building rubric-based pipelines for clinical data (MIMIC-III ICU), where I decomposed complex decisions into binary, verifiable labeling criteria, and developing the RAG-URL Protocol, which enforces auditable, step-by-step reasoning for LLMs. I am skilled in Python, SQL, and tools like pandas, NumPy, and SPSS, and have led projects applying AI-assisted summarization, translation, and structured QA/QC for research and healthcare domains. My focus is on creating data labeling and annotation workflows that are not only accurate but also interpretable and scalable for high-stakes applications.

ExpertEnglish

Labeling Experience

Model Validation Expert (MOVE Fellow)

OtherTextEvaluation Rating
As a Model Validation Expert (MOVE Fellow) at Handshake AI, I evaluate large language model (LLM) responses using structured rubric-based frameworks. I conduct RLHF training data generation through preference ranking and response comparison with predefined evaluation criteria. I focus on health science and medical domain QA for subject-matter accuracy. • Evaluate LLM response quality, factual accuracy, and reasoning coherence using structured rubrics • Generate RLHF training data via preference ranking and comparative response evaluation • Apply medical and scientific expertise in LLM evaluation tasks • Utilize Handshake AI Platform for data annotation and assessment

As a Model Validation Expert (MOVE Fellow) at Handshake AI, I evaluate large language model (LLM) responses using structured rubric-based frameworks. I conduct RLHF training data generation through preference ranking and response comparison with predefined evaluation criteria. I focus on health science and medical domain QA for subject-matter accuracy. • Evaluate LLM response quality, factual accuracy, and reasoning coherence using structured rubrics • Generate RLHF training data via preference ranking and comparative response evaluation • Apply medical and scientific expertise in LLM evaluation tasks • Utilize Handshake AI Platform for data annotation and assessment

2025
Google Cloud Vertex AI

Lead Researcher — Health Protocol Auditor

Google Cloud Vertex AIMedical DicomEvaluation Rating
I managed development of ground-truth evaluation pipelines for clinical AI, using the MIMIC-III ICU database. These pipelines established rubric-based, mathematically verifiable checks for health AI workload correctness. My work supported reference dataset creation for training, quality control, and external audits. • Applied Cockcroft-Gault formulae, clinical statistics, and QA protocols • Structured canonical outputs for clinical AI model assessment • Advanced reproducibility and auditability in healthcare datasets • Provided high-validity labels for medical AI pipeline training

I managed development of ground-truth evaluation pipelines for clinical AI, using the MIMIC-III ICU database. These pipelines established rubric-based, mathematically verifiable checks for health AI workload correctness. My work supported reference dataset creation for training, quality control, and external audits. • Applied Cockcroft-Gault formulae, clinical statistics, and QA protocols • Structured canonical outputs for clinical AI model assessment • Advanced reproducibility and auditability in healthcare datasets • Provided high-validity labels for medical AI pipeline training

2023
Google Cloud Vertex AI

9-Step Hypothesis Testing Rubric Designer

Google Cloud Vertex AITextEvaluation Rating
I created a structured evaluation protocol for clinical hypothesis testing performed by LLMs. The framework defines nine binary, independently verifiable criteria for auditability and accuracy. It supports external auditing and improvement of AI-assisted statistical analysis workflows. • Rubric includes research question clarity, hypothesis, statistical assumptions, and interpretation • Pass/fail conditions for each step ensure reproducible human and AI evaluation • Enables stepwise quality assurance in health science AI tasks • Designed for integration into LLM assessment workflows

I created a structured evaluation protocol for clinical hypothesis testing performed by LLMs. The framework defines nine binary, independently verifiable criteria for auditability and accuracy. It supports external auditing and improvement of AI-assisted statistical analysis workflows. • Rubric includes research question clarity, hypothesis, statistical assumptions, and interpretation • Pass/fail conditions for each step ensure reproducible human and AI evaluation • Enables stepwise quality assurance in health science AI tasks • Designed for integration into LLM assessment workflows

2023
Mercor

Mercor Rubric Academy Capstone Project

MercorTextEvaluation Rating
I completed a structured evaluation capstone as part of Mercor Rubric Academy certification, focused on clinical/biostatistical LLM outputs. The project entailed rubric creation, binary criterion scoring, and performance analysis for Claude and ChatGPT. Results established a reproducible and mathematically sound assessment framework. • Developed 7-criterion binary rubric for eGFR, hypothesis testing, and statistical LLM tasks • Set explicit scoring logic, rounding/tolerance parameters, and criterion dependencies • Demonstrated evolution of rubric standards via correction of expected values • Provided precise, reproducible scoring for AI model outputs in the health domain

I completed a structured evaluation capstone as part of Mercor Rubric Academy certification, focused on clinical/biostatistical LLM outputs. The project entailed rubric creation, binary criterion scoring, and performance analysis for Claude and ChatGPT. Results established a reproducible and mathematically sound assessment framework. • Developed 7-criterion binary rubric for eGFR, hypothesis testing, and statistical LLM tasks • Set explicit scoring logic, rounding/tolerance parameters, and criterion dependencies • Demonstrated evolution of rubric standards via correction of expected values • Provided precise, reproducible scoring for AI model outputs in the health domain

2026 - 2026
Mercor

Data Annotation Contractor

MercorTextClassification
As a Data Annotation Contractor for Mercor, I contributed to multiple projects covering visual, multimodal, and text annotation tasks. I implemented domain-specific rubrics to produce high-quality labeled datasets across diverse subject matter. My work enabled accurate supervised training and validation in AI model development. • Labeled match events, player actions, and formations for sports AI using structured taxonomies • Performed image classification and object detection with predefined annotation rubrics • Evaluated LLM outputs for quality and factual accuracy, contributing to SFT datasets • Executed multimodal annotation including image captioning and VQA on Multimango Platform

As a Data Annotation Contractor for Mercor, I contributed to multiple projects covering visual, multimodal, and text annotation tasks. I implemented domain-specific rubrics to produce high-quality labeled datasets across diverse subject matter. My work enabled accurate supervised training and validation in AI model development. • Labeled match events, player actions, and formations for sports AI using structured taxonomies • Performed image classification and object detection with predefined annotation rubrics • Evaluated LLM outputs for quality and factual accuracy, contributing to SFT datasets • Executed multimodal annotation including image captioning and VQA on Multimango Platform

2025 - 2026

Education

U

University of Saskatchewan

Master of Public Health, Public Health

Master of Public Health
2021 - 2024
S

St George's, University of London

Human Life Sciences, Medicine

Human Life Sciences
2017 - 2023

Work History

P

Project Hamburg Research, Inc

Health Protocol Auditor

Buffalo
2023 - 2025
P

Project Hamburg LLC

Lead Researcher

Chicago
2020 - 2025