Cristina Sala Camps - AI Data Annotation & Evaluation Specialist (Generative AI)

Key Skills

Software

Data Annotation Tech

Other

Don't disclose

Top Subject Matter

No subject matter listed

Top Data Types

Image

Text

Top Label Types

Audio Recording

Classification

Evaluation Rating

Prompt Response Writing SFT

RLHF

Text Summarization

Translation Localization

Freelancer Overview

I am an analytical AI training, evaluation and RLHF specialist with hands-on experience supporting generative AI model training and alignment. My work includes guideline-based evaluation, text-based annotation, ranking and structured assessment of AI-generated responses for accuracy, relevance, coherence, reasoning quality, completeness and safety. I have worked on reasoning evaluation and adversarial prompting projects, including the analysis of complex scientific content in biology, biochemistry and life sciences, where I identified subtle reasoning errors and failure modes even when responses appeared plausible at surface level. I have also contributed to factual verification and freshness evaluation projects, validating whether AI outputs were correct, up to date and appropriate for the intended context. In addition, I have supported multilingual content verification and localization in Spanish (Spain), Catalan, English and French, ensuring language accuracy, cultural appropriateness and regional adaptation. My scientific background, fast onboarding with new tools, and consistent delivery of high-quality human feedback support reliable performance in high-volume, deadline-driven AI training workflows.

IntermediateEnglishGermanCatalanSpanishFrench

Labeling Experience

AI Training, Evaluation & Multilingual Localization Projects

Don T DiscloseTextTranslation Localization

Worked on AI training, evaluation and quality assurance projects across multiple task types, supporting the validation and improvement of large language model outputs. My work included evaluating responses for factual accuracy, reasoning quality, completeness, relevance and alignment with user intent, following detailed, guideline-based criteria. I participated in factual verification and freshness evaluation projects, checking whether AI-generated content was correct, up to date, and contextually appropriate, including the identification of outdated, incomplete or misleading information. I also contributed to multilingual content verification and localization, ensuring outputs were properly adapted to the user’s locale and use case. This included language-specific quality checks, cultural and regional adaptation, and validation of country-specific conventions such as units of measurement, currency and terminology, particularly for Spanish-Spain, Catalan and other regional contexts.

2024 - 2025

AI Reasoning Evaluation & Adversarial Prompting (Life Sciences)

Don T DiscloseTextPrompt Response Writing SFT

Worked on advanced reasoning evaluation and adversarial prompting projects for state-of-the-art large language models. The work involved designing scientifically grounded prompts based on recent and complex scientific literature, with the explicit goal of eliciting reasoning failures in multiple concurrent LLMs. Tasks included crafting high-difficulty questions in biology, biochemistry and chemistry, analyzing model reasoning paths, identifying failure modes, and evaluating incorrect, incomplete or misleading scientific reasoning, including cases where answers appeared plausible at surface level. In multi-model evaluation settings, I compared outputs across several LLMs simultaneously, assessing scientific correctness, logical consistency, depth of reasoning and completeness, and providing structured expert human feedback to support model improvement. This work required strong life sciences domain expertise and sustained focus in cognitively demanding evaluation workflows.

2025 - 2025

RLHF & Human Feedback for Chatbot Training

Don T DiscloseTextRLHF

Worked on production-level RLHF (Reinforcement Learning from Human Feedback) and human-in-the-loop evaluation tasks supporting large language model training and alignment. My work included pairwise and multi-model comparisons, ranking and scoring AI-generated responses to user prompts, and applying strict, guideline-based evaluation criteria. I assessed chatbot outputs for accuracy, coherence, reasoning quality, completeness and alignment with user intent, identifying errors, inconsistencies, hallucinations and off-topic or over-verbose responses. I also evaluated relevance and conciseness, ensuring responses fully addressed user questions without unnecessary or misleading information, and provided structured human feedback to support continuous model improvement.

2024 - 2025

Education

N

N/A

Postgraduate Degree, Generative Artificial Intelligence

Postgraduate Degree

2024 - 2025

N

N/A

Doctor of Philosophy, Chemistry

Doctor of Philosophy

2012 - 2017

Work History

M

Multiple Organizations (Confidential)

Scientific & Technical Professional

Barcelona

2000 - 2024