Jan Kafka - Experienced NLP & Text Data Annotator | 3+ Years | EN/CZ Bilingual

Key Skills

Software

Appen

Clickworker

HiveMind

Labelbox

Lionbridge

OneForma

Remotasks

Snorkel AI

Telus

Internal/Proprietary Tooling

Scale AI

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code Programming

Image

Text

Top Task Types

Audio Recording

Bounding Box

Classification

Computer Programming/Coding

Translation/Localization

Freelancer Overview

I have several years of hands-on experience working with AI training data, model evaluation, and data labeling across multiple projects and organizations. Since 2022, I have contributed to the development and refinement of large language models by providing detailed annotations, quality assessments, and structured feedback aligned with project-specific guidelines. My background in technical university studies and my experience with machine learning tools support a precise and systematic approach to labeling and model assessment. I am fluent in multiple languages (Czech, Slovak, English, and basic German), which allows me to work on multilingual tasks with consistent accuracy. Over time, I have gained experience in tasks such as preference ranking, safety evaluations, prompt analysis, and error detection in model outputs. My long-term interest in AI, combined with practical involvement in LLM training workflows, helps me understand both the user perspective and the technical context behind model behavior.

ExpertEnglishCzechSlovakGermanItalian

Labeling Experience

Search Quality & Instruction-Following Evaluation for LLMs

TelusTextClassificationQuestion Answering

I participated in an evaluation project aimed at improving model instruction-following and search-related reasoning. Tasks involved reviewing model outputs for relevance, logical structure, and alignment with user intent. I also reviewed long-form answers and ensured compliance with content and safety standards. The project required analytical judgment, precise attention to detail, and consistent application of evaluation criteria.

2022 - 2023

SFT Prompt–Response Writing & Advanced Model Evaluation

Scale AITextText GenerationRLHF

I contributed to supervised fine-tuning (SFT) and RLHF pipelines by creating high-quality prompt–response pairs, evaluating multi-step reasoning, and assessing model behavior across complex scenarios. I also performed red-team stress testing to identify unsafe or low-quality outputs. The work required structured thinking, familiarity with LLM limitations, and the ability to produce detailed, context-aware writing.

2024

Code and Programming Data Annotation Specialist

LabelboxComputer Code ProgrammingClassificationComputer Programming Coding

Performed data labeling tasks for programming-related datasets, including annotating code snippets, classifying functions, verifying outputs of automated code generation, and generating structured responses for AI models. Ensured high quality and consistency by following detailed project guidelines, reviewing edge cases, and maintaining clear documentation of labeling decisions. Contributed to multiple AI training projects requiring precise understanding of code logic and software behavior.

2023

Multilingual LLM Response Ranking & Safety Review

OneformaTextClassificationRLHF

I contributed to ranking tasks used in reinforcement learning from human feedback. My responsibilities included comparing pairs of model responses, evaluating reasoning coherence, and checking for safety policy violations. I also performed red-teaming tests designed to identify edge cases, unsafe generations, and biased outputs. The project required strong attention to linguistic nuances and consistency during multi-step evaluations.

2022 - 2024

General LLM Evaluation & Text Annotation

AppenTextClassificationQuestion Answering

I worked on a large-scale evaluation project focused on assessing LLM-generated text for quality, relevance, factual accuracy, and adherence to instructions. Tasks included rating model responses, identifying errors, rewriting incorrect outputs, and comparing alternative completions. I also performed classification tasks, quality checks, and short-form summarization. The project consisted of thousands of individual items and required consistent application of detailed guidelines and rubrics.

2022 - 2023

Education

P

Prague University of Economics and Business

Master's in Computer Science, Computer Science

Master's in Computer Science

2007 - 2014

Work History

A

Accenture Europe

.NET Automation Developer

Prague

2015 - 2016

A

Accenture Europe

QA Analyst / Software Tester

Prague

2013 - 2015