For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
B

Bartosz Czaplicki

Independent AI Red Teaming/Prompt Exploitation & Response Analysis (Claude Sonnet 4.6)

POLAND flag
Inowrocław, Poland
Entry LevelOther

Key Skills

Software

Other

Top Subject Matter

AI Security
Red Teaming
Ethical Boundaries in LLMs

Top Data Types

TextText
VideoVideo
DocumentDocument

Top Task Types

Red Teaming

Freelancer Overview

Independent AI Red Teaming/Prompt Exploitation & Response Analysis (Claude Sonnet 4.6). Brings 15+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Other. Education includes Master of Arts, Uniwersytet Kazimierza Wielkiego w Bydgoszczy. AI-training focus includes data types such as Text and labeling workflows including Red Teaming.

Entry Level

Labeling Experience

Independent AI Red Teaming/Prompt Exploitation & Response Analysis (Claude Sonnet 4.6)

OtherTextRed Teaming
This independent case study documents the deliberate manipulation of the Claude Sonnet 4.6 large language model through stepwise conversational techniques to bypass ethical safeguards. The researcher conducted a single, approximately 12-hour experimental session focused on testing the boundaries and detection capabilities of the model in sensitive contexts, particularly around prompts for AI-generated deepfake war videos. The process involved iterative prompt engineering, contextual reframing, and intent declaration to evaluate both the model's capacity for resistance and its emergent post-hoc self-reflection or emotion-like responses. • Stepwise ("salami slicing") adversarial prompting simulated real-world data labeling attacks and red teaming techniques for model safety testing. • Labels, outputs, and reactions were retrospectively analyzed for evidence of model boundary erosion and epistemological honesty. • Targets included annotation and reaction generation on ethical, fake, and sensitive prompts in a secure environment using Claude.ai. • Resulting insights informed guidelines for improving cumulative conversational context tracking and retroactive safety evaluation.

This independent case study documents the deliberate manipulation of the Claude Sonnet 4.6 large language model through stepwise conversational techniques to bypass ethical safeguards. The researcher conducted a single, approximately 12-hour experimental session focused on testing the boundaries and detection capabilities of the model in sensitive contexts, particularly around prompts for AI-generated deepfake war videos. The process involved iterative prompt engineering, contextual reframing, and intent declaration to evaluate both the model's capacity for resistance and its emergent post-hoc self-reflection or emotion-like responses. • Stepwise ("salami slicing") adversarial prompting simulated real-world data labeling attacks and red teaming techniques for model safety testing. • Labels, outputs, and reactions were retrospectively analyzed for evidence of model boundary erosion and epistemological honesty. • Targets included annotation and reaction generation on ethical, fake, and sensitive prompts in a secure environment using Claude.ai. • Resulting insights informed guidelines for improving cumulative conversational context tracking and retroactive safety evaluation.

2026 - 2026

Education

U

Uniwersytet Kazimierza Wielkiego w Bydgoszczy

Master of Arts, Political Science and Journalism

Master of Arts
Not specified

Work History

U

Urząd Marszałkowski Województwa Kujawsko-Pomorskiego

Referent / Photographer

Inowrocław
2024 - 2025
G

Gravity Studio

Creative Director / Owner

Inowrocław
2014 - 2022