For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
F

Filip Toloczko

AI Quality Engineer

USA flagSan Francisco, Usa
Intermediate

Key Skills

Software

No software listed

Top Subject Matter

LLM Evaluation for Coding Tasks
Emotion Analysis in Text Data
Stance Detection in Text Data

Top Data Types

TextText

Top Task Types

Emotion RecognitionEmotion Recognition
ClassificationClassification

Freelancer Overview

AI Quality Engineer. Brings 2+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Internal and Proprietary Tooling. Education includes Bachelor of Science, University of Illinois Chicago (2024). AI-training focus includes data types such as Computer Code, Programming, and Text and labeling workflows including Evaluation, Rating, and Emotion Recognition.

Intermediate

Labeling Experience

AI Quality Engineer

As an AI Quality Engineer, I developed and evaluated prompts to assess large language model (LLM) performance on coding tasks. My work involved analyzing outputs for accuracy, uncovering recurring failure patterns, and providing iterative feedback to enhance reasoning. The main objective was to improve LLM consistency and effectiveness on complex coding challenges. • Designed and executed LLM evaluation tasks specific to code generation and reasoning. • Identified and documented recurring patterns of model failure. • Refined and iterated prompts to address weaknesses in LLM reasoning. • Collaborated with engineering teams to integrate evaluation protocols.

As an AI Quality Engineer, I developed and evaluated prompts to assess large language model (LLM) performance on coding tasks. My work involved analyzing outputs for accuracy, uncovering recurring failure patterns, and providing iterative feedback to enhance reasoning. The main objective was to improve LLM consistency and effectiveness on complex coding challenges. • Designed and executed LLM evaluation tasks specific to code generation and reasoning. • Identified and documented recurring patterns of model failure. • Refined and iterated prompts to address weaknesses in LLM reasoning. • Collaborated with engineering teams to integrate evaluation protocols.

2026 - Present

Machine Learning Researcher

TextEmotion Recognition
In this role, I trained emotion classification models on textual forum posts, with a focus on multi-class emotion recognition. My responsibilities included curating and annotating posts, automating data preprocessing, and coordinating iterative reviews to ensure label reliability. The goal was to improve model F1 scores and refine minority class detection. • Annotated over 2,500 forum posts using a custom-built internal tool. • Automated preprocessing and ensured consistent data formatting for BERT models. • Conducted iterative labeling reviews to achieve high inter-annotator agreement. • Analyzed emotion classification results and optimized model training.

In this role, I trained emotion classification models on textual forum posts, with a focus on multi-class emotion recognition. My responsibilities included curating and annotating posts, automating data preprocessing, and coordinating iterative reviews to ensure label reliability. The goal was to improve model F1 scores and refine minority class detection. • Annotated over 2,500 forum posts using a custom-built internal tool. • Automated preprocessing and ensured consistent data formatting for BERT models. • Conducted iterative labeling reviews to achieve high inter-annotator agreement. • Analyzed emotion classification results and optimized model training.

2022 - 2023

Software Engineer (Annotation-focused)

TextClassification
I contributed to stance detection research by labeling and annotating forum posts with NLP models in the loop. My work included using a proprietary annotation tool and iterative data augmentation to enhance model performance. This process supported the development of transformer-based stance detection systems. • Labeled over 3,000 posts for stance classification using a custom tool. • Performed iterative text rephrasing to expand training data diversity. • Utilized Python preprocessing pipelines to prepare data for annotation. • Provided regular annotation quality reports to the research team.

I contributed to stance detection research by labeling and annotating forum posts with NLP models in the loop. My work included using a proprietary annotation tool and iterative data augmentation to enhance model performance. This process supported the development of transformer-based stance detection systems. • Labeled over 3,000 posts for stance classification using a custom tool. • Performed iterative text rephrasing to expand training data diversity. • Utilized Python preprocessing pipelines to prepare data for annotation. • Provided regular annotation quality reports to the research team.

2022 - 2022

Education

U

University of Illinois Chicago

Bachelor of Science, Computer Science

Bachelor of Science
2021 - 2024

Work History

H

Handshake AI

AI Quality Engineer

San Francisco
2026 - Present
N

NeuralSeek

AI Engineer

Miami
2025 - 2026