Marte Ditlefsen - LLM Evaluation and Text Generation Specialist in English & Norwegian

Key Skills

Software

Clickworker

Data Annotation Tech

Labelbox

Top Subject Matter

No subject matter listed

Top Data Types

Audio

Image

Text

Top Task Types

Audio Recording

Evaluation Rating

Prompt Response Writing SFT

Translation Localization

Freelancer Overview

I have hands-on experience with data annotation and AI training across several projects, including work with DataForce and Labelbox. In the DataForce project, I worked with audio recordings where I listened, analyzed, and labeled spoken interactions. I’ve also completed a Labelbox project where I transcribed and evaluated a conversation between an AI system and a human, ensuring accuracy, clarity, and natural flow. In addition to this, I’ve worked on tasks where I created images and prompts designed to trigger or test guideline-breaking responses from AI models. This involved comparing model outputs, identifying safety issues, analyzing hallucinations, and evaluating how well the system followed policy. I’ve also reviewed pairs of responses to determine which one was safer, more fluent, or more aligned with instructions. These projects required both linguistic sensitivity and a solid understanding of AI safety, reasoning, and model behavior. My background as a teacher in Norwegian and English gives me strong language intuition and a clear sense of structure, meaning, and natural communication. I’m comfortable working with detailed instructions, subtle distinctions, and tasks that require consistent judgment. I enjoy identifying errors, rewriting unclear segments, and helping improve model performance in both Norwegian and English.

Entry LevelEnglishSpanishNorwegian

Labeling Experience

Arsenic - Safety Re

Data Annotation TechImageClassificationText Generation

In the Arsenic project, I evaluated AI-generated responses across complex conversational contexts to ensure they followed safety guidelines and policy standards. My tasks included scoring responses for safety, fluency, coherence and policy alignment, identifying hallucinations and risky output, and flagging violations such as harmful instructions or sensitive-content breaches. I also compared pairs of model outputs (A/B testing) and selected the version that best aligned with the guidelines. In addition, I created and reviewed adversarial prompts and image-based scenarios designed to test whether the model would respond safely or be triggered into guideline-breaking behavior. The role required nuanced judgment, consistency, and strong understanding of risk categories, edge cases, and human-AI interaction.

2025 - 2025

Education

O

Oslo Metropolitan University

Additional Qualification, English Language Teaching

Additional Qualification

2020 - 2021

O

Oslo Metropolitan University

Bachelor of Education, Primary and Lower Secondary Education

Bachelor of Education

2014 - 2018

Work History

V

Vålerenga Primary School

Pedagogical Team Leader

Oslo

2022 - 2025

V

Vålerenga Primary School

Classroom Teacher

Oslo

2018 - 2025