Max - Software Engineer (20 years) / AI Coding Agent Evaluation / RL Environments Engineer

Key Skills

Software

Other

Mindrift

Mercor

Toloka

Google Cloud Vertex AI

Internal/Proprietary Tooling

Top Subject Matter

AI coding agents and software engineering assessment

Computer-use expert

RL Environments engineer

Top Data Types

Computer Code Programming

Document

Text

Top Task Types

Computer Programming/Coding

Data Collection

Evaluation/Rating

RLHF

Prompt + Response Writing (SFT)

Question Answering

Red Teaming

Fine-tuning

Text Summarization

Freelancer Overview

At Mindrift, I design adversarial evaluation tasks for Claude Opus 4.6 within simulated company environments — complete with Python and TypeScript repositories, Jira tickets, documentation, and Slack messages. My goal is to craft prompts calibrated so the agent fails roughly half the time, then write automated end-to-end validation using the pytest framework, including system tests and AST-based code verification. After each agent run, I perform deep analysis of the diffs and transcripts, extracting concrete evidence of genuine reasoning failures that feeds directly into Opus's training pipeline. At Mercor, I recorded 200+ half-hour screencast sessions of professional software usage for computer-use model training — covering IDEs (VS Code, Xcode, PyCharm), 3D modeling tools (Blender, 3ds Max, Reality Composer), graphics software (Photoshop, Illustrator), office applications (Notes, Pages, Numbers), and advanced macOS workflows. Each session includes precise narration of every action performed with explicit reasoning for each decision, providing the step-by-step demonstration data that teaches AI agents how humans actually navigate complex software environments. I also perform AI red teaming work covering safety taxonomies, adversarial attack design across single-turn and multi-turn interactions — including jailbreaking, prompt injection, and crescendo attacks — as well as edge case labeling, bias testing prompt construction, and defensive architecture evaluation. This involves designing red team prompts that probe model vulnerabilities and systematically categorizing failure modes across safety dimensions.

IntermediateHebrewRussianEnglish

Labeling Experience

AI Coding Agent Evaluation & RL Environments Engineer

MercorComputer Code ProgrammingRLHF

At Mindrift (Toloka AI) I design adversarial evaluation tasks for Claude Opus 4.6 within simulated company environments — complete with Python and TypeScript repositories, Jira tickets, documentation, and Slack messages. My goal is to craft prompts calibrated so the agent fails roughly half the time, then write automated end-to-end validation using the pytest framework, including system tests and AST-based code verification. After each agent run, I perform deep analysis of the diffs and transcripts, extracting concrete evidence of genuine reasoning failures that feeds directly into Opus's training pipeline.

2026 - Present

Computer-Use Training Data Specialist

MindriftComputer Code ProgrammingData Collection

I recorded 200+ half-hour screencast sessions of professional software usage for computer-use model training — covering IDEs (VS Code, Xcode, PyCharm), 3D modeling tools (Blender, 3ds Max, Reality Composer), graphics software (Photoshop, Illustrator), office applications (Notes, Pages, Numbers), and advanced macOS workflows. Each session includes precise narration of every action performed with explicit reasoning for each decision, providing the step-by-step demonstration data that teaches AI agents how humans actually navigate complex software environments.

2026 - Present

AI Red Team & Safety Evaluation Specialist

OtherTextRed Teaming

Perform AI red teaming work covering safety taxonomies, adversarial attack design across single-turn and multi-turn interactions — including jailbreaking, prompt injection, and crescendo attacks — as well as edge case labeling, bias testing prompt construction, and defensive architecture evaluation. This involves designing red team prompts that probe model vulnerabilities and systematically categorizing failure modes across safety dimensions.

2025 - Present

Education

R

Reichman University - Media Innovation Lab

Finished year-long course, Human-Computer Interaction & Augmented Reality research

Finished year-long course

2011 - 2012

R

Reichman University

Bachelor of Arts, Computer Science

Bachelor of Arts

2008 - 2011

Work History

M

Mercor

Computer-Use Training Data Specialist

N/A

2026 - Present

M

Mindrift

AI Coding Agent Evaluation & RL Environments Engineer

N/A

2026 - Present