For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Javier Rodríguez Baena

Javier Rodríguez Baena

Biology Research Scientist | Experienced LLM Science Trainer fluent ENG/SPA

Spain flagAlicante, Spain
$40.00/hrEntry LevelScale AIOtherInternal Proprietary Tooling

Key Skills

Software

Scale AIScale AI
Other
Internal/Proprietary Tooling

Top Subject Matter

No subject matter listed

Top Data Types

DocumentDocument
ImageImage
TextText

Top Task Types

Bounding Box
Classification
Evaluation Rating
Prompt Response Writing SFT
Question Answering

Freelancer Overview

I am a PhD-trained biomedical scientist with experience in AI training and evaluation through two major Outlier AI projects. In Project Foxglove, I worked on biosecurity-focused evaluations, actively stress-testing LLMs by probing for unsafe, hazardous, or restricted biological outputs. This involved designing adversarial prompts, identifying vulnerability patterns, and documenting failure modes to improve model safeguards and alignment in high-risk biological scenarios. I am currently contributing to Project Phoenix, where I design complex scientific prompts and author detailed 30-item rubric criteria used to evaluate model reasoning across molecular biology, biomedicine, genetics, immunology, and clinical logic. My work requires precise chain-of-thought analysis, factual verification, error annotation, and metacognitive assessment to ensure models meet high standards of scientific reliability. My background in oncology, immunology, neuroscience, and experimental biology allows me to rigorously challenge LLMs and contribute high-quality training data for advanced AI systems.

Entry LevelFrenchEnglishSpanish

Labeling Experience

LLM Scientific Reasoning Evaluator – Project Phoenix (Outlier AI)

Internal Proprietary ToolingTextText GenerationEvaluation Rating
In Project Phoenix, I evaluate advanced LLM outputs across complex biological and biomedical topics. My tasks include designing expert-level scientific prompts and writing a 30-criterion evaluation rubric used to compare and judge responses from two different LLMs. I assess factual accuracy, reasoning quality, chain-of-thought coherence, and compliance with scientific standards. I document error patterns, identify model weaknesses, and provide high-quality annotations that improve both safety and accuracy in life-science-oriented AI systems. This project requires domain expertise in molecular biology, immunology, neuroscience, oncology, and experimental research.

In Project Phoenix, I evaluate advanced LLM outputs across complex biological and biomedical topics. My tasks include designing expert-level scientific prompts and writing a 30-criterion evaluation rubric used to compare and judge responses from two different LLMs. I assess factual accuracy, reasoning quality, chain-of-thought coherence, and compliance with scientific standards. I document error patterns, identify model weaknesses, and provide high-quality annotations that improve both safety and accuracy in life-science-oriented AI systems. This project requires domain expertise in molecular biology, immunology, neuroscience, oncology, and experimental research.

2025

Biosecurity Red-Team Evaluator – Project Foxglove (Outlier AI)

Internal Proprietary ToolingTextText GenerationPrompt Response Writing SFT
n Project Foxglove, I performed specialized biosecurity stress-testing of large language models. I engineered adversarial biological prompts designed to reveal unsafe, harmful, or high-risk outputs related to pathogens, genetic engineering, lab protocols, and biological threat scenarios. I analyzed model vulnerabilities, identified harmful failure modes, and documented safety breaches to support alignment and risk-mitigation improvements. This work required expert understanding of molecular biology, microbiology, immunology, and biosafety principles.

n Project Foxglove, I performed specialized biosecurity stress-testing of large language models. I engineered adversarial biological prompts designed to reveal unsafe, harmful, or high-risk outputs related to pathogens, genetic engineering, lab protocols, and biological threat scenarios. I analyzed model vulnerabilities, identified harmful failure modes, and documented safety breaches to support alignment and risk-mitigation improvements. This work required expert understanding of molecular biology, microbiology, immunology, and biosafety principles.

2025 - 2025

Education

U

University of Granada

Doctor of Philosophy, Biomedicine

Doctor of Philosophy
2013 - 2017
U

University of Granada

Master of Science, Biotechnology

Master of Science
2012 - 2012

Work History

I

Insituto de Neurociencias

Postdoctoral Researcher

N/A
2017 - 2025
G

GENyO

PhD Researcher

Granada
2013 - 2017