For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Sten Marcus Malva

Sten Marcus Malva

Expert Data Annotator across NLP, mathematics and programming tasks.

Japan flagInabe, Japan
$20.00/hrExpertAws SagemakerClickworkerCrowdsource

Key Skills

Software

AWS SageMakerAWS SageMaker
ClickworkerClickworker
CrowdSourceCrowdSource
Img Lab
LabelboxLabelbox
Internal/Proprietary Tooling
Scale AIScale AI

Top Subject Matter

No subject matter listed

Top Data Types

AudioAudio
Computer Code ProgrammingComputer Code Programming
DocumentDocument

Top Task Types

Computer Programming Coding
Data Collection
Evaluation Rating
Fine Tuning
Function Calling

Freelancer Overview

I have extensive experience in AI training data and model evaluation, having contributed to the assessment of multiple large-scale models for leading organizations including OpenAI, Google, and Meta AI. My primary expertise lies in evaluating mathematical reasoning and computer programming outputs, where precision, logical consistency, and adherence to specifications are critical. In addition, I have worked on projects involving text, function calling, audio, and visual data, giving me a well-rounded perspective on multimodal AI evaluation. Beyond commercial model evaluation, I have collaborated with university research teams on developing AI models for text simplification in both English and Estonian. This work required careful annotation, linguistic sensitivity, and alignment with research objectives, further strengthening my ability to produce high-quality, reliable training data across diverse domains and use cases.

ExpertEnglishSpanishEstonianJapanese

Labeling Experience

Scale AI

Image Recognition Correctness

Scale AIImageObject Detection
The models were tasked with solving mathematical problems from an uploaded image. The main goal was to evaluate the correctness of both the solutions and the capturing of information from the image.

The models were tasked with solving mathematical problems from an uploaded image. The main goal was to evaluate the correctness of both the solutions and the capturing of information from the image.

2025 - 2025

Video Evaluation

Internal Proprietary ToolingVideoEvaluation Rating
Goal of the project was to evaluate AI generated video outputs according to the given prompt. Main focus was that everything from the prompt needed to be included and for the video to not include any hallucinations.

Goal of the project was to evaluate AI generated video outputs according to the given prompt. Main focus was that everything from the prompt needed to be included and for the video to not include any hallucinations.

2025 - 2025
AWS SageMaker

Web Development Evaluation

Aws SagemakerComputer Code ProgrammingEvaluation RatingComputer Programming Coding
The goal of the project was to ask the model to create a website with specific key components. Then three different models were evaluated side by side for accuracy and professionalism.

The goal of the project was to ask the model to create a website with specific key components. Then three different models were evaluated side by side for accuracy and professionalism.

2024 - 2024
Scale AI

Mathematics reinforced learning

Scale AITextComputer Programming Coding
The goal of the project was to write a mathematical problem complicated enough to make the model fail to give a correct solution. Then rate the model across fields like text and localization and finally correct it's mistakes.

The goal of the project was to write a mathematical problem complicated enough to make the model fail to give a correct solution. Then rate the model across fields like text and localization and finally correct it's mistakes.

2024 - 2024
Labelbox

Audio Classification

LabelboxAudioPoint Key Point
The main goal was to talk with an AI assistant and ensure that the model fails to assist the user on certain requests. Then rate the model for fields like speed, tone, information correctness etc.

The main goal was to talk with an AI assistant and ensure that the model fails to assist the user on certain requests. Then rate the model for fields like speed, tone, information correctness etc.

2024 - 2024

Education

U

University of Tartu

Bachelor, Computer Science

Bachelor
2021 - 2024
N

Nõo Gymnasium Of Real Sciences

Gymnasium Degree, Real Sciences

Gymnasium Degree
2017 - 2020

Work History

U

University of Tartu

Research Programmer

Tartu
2021 - Present
M

Mappy

CEO and Lead Developer

N/A
2020 - 2021