For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
D
Dennis Armstrong

Dennis Armstrong

AI Systems Evaluator & Technical Research Contributor

USA flagRemote, Usa
$45.00/hrExpertLabelboxScale AIAws Sagemaker

Key Skills

Software

LabelboxLabelbox
Scale AIScale AI
AWS SageMakerAWS SageMaker
AppenAppen
Surge AISurge AI

Top Subject Matter

Software engineering
computer science
Automation Domain Expertise

Top Data Types

TextText
ImageImage
DocumentDocument

Top Task Types

RLHFRLHF
Prompt + Response Writing (SFT)Prompt + Response Writing (SFT)
Fine-tuningFine-tuning
Red TeamingRed Teaming
Computer Programming/CodingComputer Programming/Coding
Text GenerationText Generation
Question AnsweringQuestion Answering
Evaluation/RatingEvaluation/Rating
Text SummarizationText Summarization

Freelancer Overview

AI Systems Evaluator & Technical Research Contributor. Brings 7+ years of professional experience across legal operations, contract review, compliance, and structured analysis. Core strengths include Internal and Proprietary Tooling. Education includes Bachelor of Science, New York University (2017). AI-training focus includes data types such as Computer Code, Programming, and Text and labeling workflows including Evaluation, Rating, and RLHF.

ExpertEnglish

Labeling Experience

Automation Workflow Design & Evaluation — Make.com, Zapier, API Integration Logic

Computer Code ProgrammingComputer Programming Coding
Designed, built, and evaluated automation workflows connecting thousands of SaaS applications as a Senior Software Engineer at Zapier. Developed backend services supporting high-volume event-driven automation, built integration connectors with OAuth and API token authentication, and designed data transformation pipelines for cross-platform workflows. Evaluated AI-assisted automation configurations for accuracy and usability as part of internal experimentation with AI-generated workflow suggestions — assessing whether machine-generated logic correctly handled branching conditions, error states, and data mapping requirements across multi-step processes. Contributed JavaScript custom logic for workflow conditions where no-code tools were insufficient, including data validation, conditional branching, and API response parsing. Diagnosed and resolved automation failures caused by API inconsistencies and edge-case scenarios across production workflows handling millions of daily events. Capable of evaluating prompt-to-workflow outputs for logical correctness, practical usability, and adherence to automation best practices — including error handling robustness, retry logic, and failure recovery patterns.

Designed, built, and evaluated automation workflows connecting thousands of SaaS applications as a Senior Software Engineer at Zapier. Developed backend services supporting high-volume event-driven automation, built integration connectors with OAuth and API token authentication, and designed data transformation pipelines for cross-platform workflows. Evaluated AI-assisted automation configurations for accuracy and usability as part of internal experimentation with AI-generated workflow suggestions — assessing whether machine-generated logic correctly handled branching conditions, error states, and data mapping requirements across multi-step processes. Contributed JavaScript custom logic for workflow conditions where no-code tools were insufficient, including data validation, conditional branching, and API response parsing. Diagnosed and resolved automation failures caused by API inconsistencies and edge-case scenarios across production workflows handling millions of daily events. Capable of evaluating prompt-to-workflow outputs for logical correctness, practical usability, and adherence to automation best practices — including error handling robustness, retry logic, and failure recovery patterns.

2023 - Present

Data Science Problem Authoring & ML Evaluation — Python, NumPy, Pandas, scikit-learn

TextText Generation
Designed and validated original end-to-end data science problems simulating real-world analytical workflows, suitable for AI training datasets and model benchmarking. Problems spanned the full data science lifecycle — data ingestion, schema validation, cleaning, exploratory analysis, feature engineering, model training, evaluation, and deployment considerations. Built and validated solutions using Python with NumPy, Pandas, SciPy, and scikit-learn, ensuring all outputs were deterministic, reproducible, and computationally intensive enough to meaningfully test advanced model reasoning. Applied statistical analysis techniques including hypothesis testing, regression, classification, and model performance metrics. Evaluated AI-generated data science code for correctness of statistical assumptions, appropriate library usage, valid train/test split methodology, correct metric interpretation, and sound feature engineering decisions — providing structured rubric-based feedback for each identified issue. Real-world project experience includes building a campus marketplace with scikit-learn-powered ML ranking and an OCR pipeline extracting structured medicine pricing data from pharmacy receipts — demonstrating practical applied data science across multiple domains.

Designed and validated original end-to-end data science problems simulating real-world analytical workflows, suitable for AI training datasets and model benchmarking. Problems spanned the full data science lifecycle — data ingestion, schema validation, cleaning, exploratory analysis, feature engineering, model training, evaluation, and deployment considerations. Built and validated solutions using Python with NumPy, Pandas, SciPy, and scikit-learn, ensuring all outputs were deterministic, reproducible, and computationally intensive enough to meaningfully test advanced model reasoning. Applied statistical analysis techniques including hypothesis testing, regression, classification, and model performance metrics. Evaluated AI-generated data science code for correctness of statistical assumptions, appropriate library usage, valid train/test split methodology, correct metric interpretation, and sound feature engineering decisions — providing structured rubric-based feedback for each identified issue. Real-world project experience includes building a campus marketplace with scikit-learn-powered ML ranking and an OCR pipeline extracting structured medicine pricing data from pharmacy receipts — demonstrating practical applied data science across multiple domains.

2025 - Present

Computer Science Problem Design & AI Reasoning Evaluation — Algorithms, Logic, Multi-Step Reasoning

TextQuestion Answering
Designed rigorous, industry-relevant computer science problems covering algorithms, data structures, distributed systems, software architecture, and computational reasoning — with fully documented assumptions, constraints, and Python-validated expected outputs for use in AI training and evaluation datasets. Evaluated AI-generated solutions to multi-step computer science problems using structured criteria — assessing logical correctness, algorithm efficiency, edge-case handling, appropriate data structure selection, and quality of step-by-step explanation. Provided detailed written feedback categorising each error type and explaining the correct reasoning path. Performed red teaming of AI reasoning outputs to identify hallucinated algorithm complexity claims, incorrect base case handling, flawed recursion logic, off-by-one errors, and logically inconsistent multi-step derivations — with precise, actionable written annotations for each flagged issue. Academic foundation includes coursework in Algorithms and Data Structures, Distributed Systems, Software Architecture, Machine Learning Foundations, and Computer Networks from New York University. Professional engineering experience at GitLab and Zapier provides the practical depth needed to evaluate real-world software engineering reasoning at an expert level.

Designed rigorous, industry-relevant computer science problems covering algorithms, data structures, distributed systems, software architecture, and computational reasoning — with fully documented assumptions, constraints, and Python-validated expected outputs for use in AI training and evaluation datasets. Evaluated AI-generated solutions to multi-step computer science problems using structured criteria — assessing logical correctness, algorithm efficiency, edge-case handling, appropriate data structure selection, and quality of step-by-step explanation. Provided detailed written feedback categorising each error type and explaining the correct reasoning path. Performed red teaming of AI reasoning outputs to identify hallucinated algorithm complexity claims, incorrect base case handling, flawed recursion logic, off-by-one errors, and logically inconsistent multi-step derivations — with precise, actionable written annotations for each flagged issue. Academic foundation includes coursework in Algorithms and Data Structures, Distributed Systems, Software Architecture, Machine Learning Foundations, and Computer Networks from New York University. Professional engineering experience at GitLab and Zapier provides the practical depth needed to evaluate real-world software engineering reasoning at an expert level.

2024 - Present

AI Code Evaluation & RLHF Feedback — Python, JavaScript & Automation Workflows

Computer Code ProgrammingEvaluation Rating
Evaluated AI-generated code and technical reasoning outputs across Python, JavaScript, and algorithmic problem domains as an independent technical contributor. Applied structured rubrics to assess logical correctness, time and space efficiency, edge-case handling, explanation quality, and adherence to stated constraints across hundreds of multi-step coding tasks. Contributed RLHF preference rankings by comparing model-generated responses and selecting higher-quality outputs based on accuracy, completeness, and reasoning soundness. Authored original high-quality Python solutions used as positive training examples in fine-tuning datasets covering data manipulation, algorithm design, API integration, and automation logic. Performed red teaming of AI outputs to surface hallucinated API references, incorrect algorithm complexity claims, logically inconsistent reasoning chains, and flawed assumption statements — providing precise, categorised written feedback for each identified issue. Designed original multi-step computational problems with fully documented assumptions, input/output specifications, and Python-validated expected answers. Problems spanned real-world domains including financial data analysis, e-commerce pipeline design, and software engineering workflows — ensuring deterministic, reproducible outputs suitable for model benchmarking. Technical background includes 5+ years of professional software engineering at Zapier and GitLab, with direct experience evaluating AI-assisted automation configurations in a production environment. Computer Science degree from New York University.

Evaluated AI-generated code and technical reasoning outputs across Python, JavaScript, and algorithmic problem domains as an independent technical contributor. Applied structured rubrics to assess logical correctness, time and space efficiency, edge-case handling, explanation quality, and adherence to stated constraints across hundreds of multi-step coding tasks. Contributed RLHF preference rankings by comparing model-generated responses and selecting higher-quality outputs based on accuracy, completeness, and reasoning soundness. Authored original high-quality Python solutions used as positive training examples in fine-tuning datasets covering data manipulation, algorithm design, API integration, and automation logic. Performed red teaming of AI outputs to surface hallucinated API references, incorrect algorithm complexity claims, logically inconsistent reasoning chains, and flawed assumption statements — providing precise, categorised written feedback for each identified issue. Designed original multi-step computational problems with fully documented assumptions, input/output specifications, and Python-validated expected answers. Problems spanned real-world domains including financial data analysis, e-commerce pipeline design, and software engineering workflows — ensuring deterministic, reproducible outputs suitable for model benchmarking. Technical background includes 5+ years of professional software engineering at Zapier and GitLab, with direct experience evaluating AI-assisted automation configurations in a production environment. Computer Science degree from New York University.

2024 - Present

Problem Design & Dataset Authoring

Prompt Response Writing SFT
Designed complex, multi-step computational and analytical problems with precise assumptions, constraints, and expected outcomes for use in AI model training and benchmarking. Authored comprehensive, deterministic problem statements spanning data ingestion, transformation, and deployment scenarios across various domains. Validated solutions in Python to ensure correctness and reproducibility for high-stakes evaluation datasets. • Prepared full specification logic tasks, including input/output formats for model fine-tuning. • Produced gold-standard analytical workflows for domains like finance, healthcare, and e-commerce. • Assured that all problem statements tested model reasoning under real-world complexity and constraint. • Maintained meticulous documentation to maximize data quality for downstream AI training.

Designed complex, multi-step computational and analytical problems with precise assumptions, constraints, and expected outcomes for use in AI model training and benchmarking. Authored comprehensive, deterministic problem statements spanning data ingestion, transformation, and deployment scenarios across various domains. Validated solutions in Python to ensure correctness and reproducibility for high-stakes evaluation datasets. • Prepared full specification logic tasks, including input/output formats for model fine-tuning. • Produced gold-standard analytical workflows for domains like finance, healthcare, and e-commerce. • Assured that all problem statements tested model reasoning under real-world complexity and constraint. • Maintained meticulous documentation to maximize data quality for downstream AI training.

2024 - Present

Education

N

New York University

Bachelor of Science, Software Engineering and Computer Science

Bachelor of Science
2017 - 2017

Work History

I

Independent Projects

Blockchain & Decentralised Application Developer

Remote
2022 - Present
Z

Zapier

Senior Software Engineer - Automation & Integrations

Remote
2023 - 2025