For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
J
Jan Kafka

Jan Kafka

Experienced NLP & Text Data Annotator | 3+ Years | EN/CZ Bilingual

Czech flagPříbram, Czech
$20.00/hrExpertAppenClickworkerHivemind

Key Skills

Software

AppenAppen
ClickworkerClickworker
HiveMindHiveMind
LabelboxLabelbox
LionbridgeLionbridge
OneFormaOneForma
RemotasksRemotasks
Snorkel AISnorkel AI
TelusTelus
Internal/Proprietary Tooling
Scale AIScale AI

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code ProgrammingComputer Code Programming
ImageImage
TextText

Top Task Types

Audio RecordingAudio Recording
Bounding BoxBounding Box
ClassificationClassification
Computer Programming/CodingComputer Programming/Coding
Translation/LocalizationTranslation/Localization

Freelancer Overview

I have several years of hands-on experience working with AI training data, model evaluation, and data labeling across multiple projects and organizations. Since 2022, I have contributed to the development and refinement of large language models by providing detailed annotations, quality assessments, and structured feedback aligned with project-specific guidelines. My background in technical university studies and my experience with machine learning tools support a precise and systematic approach to labeling and model assessment. I am fluent in multiple languages (Czech, Slovak, English, and basic German), which allows me to work on multilingual tasks with consistent accuracy. Over time, I have gained experience in tasks such as preference ranking, safety evaluations, prompt analysis, and error detection in model outputs. My long-term interest in AI, combined with practical involvement in LLM training workflows, helps me understand both the user perspective and the technical context behind model behavior.

ExpertEnglishCzechSlovakGermanItalian

Labeling Experience

Telus

Search Quality & Instruction-Following Evaluation for LLMs

TelusTextClassificationQuestion Answering
I participated in an evaluation project aimed at improving model instruction-following and search-related reasoning. Tasks involved reviewing model outputs for relevance, logical structure, and alignment with user intent. I also reviewed long-form answers and ensured compliance with content and safety standards. The project required analytical judgment, precise attention to detail, and consistent application of evaluation criteria.

I participated in an evaluation project aimed at improving model instruction-following and search-related reasoning. Tasks involved reviewing model outputs for relevance, logical structure, and alignment with user intent. I also reviewed long-form answers and ensured compliance with content and safety standards. The project required analytical judgment, precise attention to detail, and consistent application of evaluation criteria.

2022 - 2023
Scale AI

SFT Prompt–Response Writing & Advanced Model Evaluation

Scale AITextText GenerationRLHF
I contributed to supervised fine-tuning (SFT) and RLHF pipelines by creating high-quality prompt–response pairs, evaluating multi-step reasoning, and assessing model behavior across complex scenarios. I also performed red-team stress testing to identify unsafe or low-quality outputs. The work required structured thinking, familiarity with LLM limitations, and the ability to produce detailed, context-aware writing.

I contributed to supervised fine-tuning (SFT) and RLHF pipelines by creating high-quality prompt–response pairs, evaluating multi-step reasoning, and assessing model behavior across complex scenarios. I also performed red-team stress testing to identify unsafe or low-quality outputs. The work required structured thinking, familiarity with LLM limitations, and the ability to produce detailed, context-aware writing.

2024
Labelbox

Code and Programming Data Annotation Specialist

LabelboxComputer Code ProgrammingClassificationComputer Programming Coding
Performed data labeling tasks for programming-related datasets, including annotating code snippets, classifying functions, verifying outputs of automated code generation, and generating structured responses for AI models. Ensured high quality and consistency by following detailed project guidelines, reviewing edge cases, and maintaining clear documentation of labeling decisions. Contributed to multiple AI training projects requiring precise understanding of code logic and software behavior.

Performed data labeling tasks for programming-related datasets, including annotating code snippets, classifying functions, verifying outputs of automated code generation, and generating structured responses for AI models. Ensured high quality and consistency by following detailed project guidelines, reviewing edge cases, and maintaining clear documentation of labeling decisions. Contributed to multiple AI training projects requiring precise understanding of code logic and software behavior.

2023
OneForma

Multilingual LLM Response Ranking & Safety Review

OneformaTextClassificationRLHF
I contributed to ranking tasks used in reinforcement learning from human feedback. My responsibilities included comparing pairs of model responses, evaluating reasoning coherence, and checking for safety policy violations. I also performed red-teaming tests designed to identify edge cases, unsafe generations, and biased outputs. The project required strong attention to linguistic nuances and consistency during multi-step evaluations.

I contributed to ranking tasks used in reinforcement learning from human feedback. My responsibilities included comparing pairs of model responses, evaluating reasoning coherence, and checking for safety policy violations. I also performed red-teaming tests designed to identify edge cases, unsafe generations, and biased outputs. The project required strong attention to linguistic nuances and consistency during multi-step evaluations.

2022 - 2024
Appen

General LLM Evaluation & Text Annotation

AppenTextClassificationQuestion Answering
I worked on a large-scale evaluation project focused on assessing LLM-generated text for quality, relevance, factual accuracy, and adherence to instructions. Tasks included rating model responses, identifying errors, rewriting incorrect outputs, and comparing alternative completions. I also performed classification tasks, quality checks, and short-form summarization. The project consisted of thousands of individual items and required consistent application of detailed guidelines and rubrics.

I worked on a large-scale evaluation project focused on assessing LLM-generated text for quality, relevance, factual accuracy, and adherence to instructions. Tasks included rating model responses, identifying errors, rewriting incorrect outputs, and comparing alternative completions. I also performed classification tasks, quality checks, and short-form summarization. The project consisted of thousands of individual items and required consistent application of detailed guidelines and rubrics.

2022 - 2023

Education

P

Prague University of Economics and Business

Master's in Computer Science, Computer Science

Master's in Computer Science
2007 - 2014

Work History

A

Accenture Europe

.NET Automation Developer

Prague
2015 - 2016
A

Accenture Europe

QA Analyst / Software Tester

Prague
2013 - 2015