For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Steven Hebner

Steven Hebner

Expert in Python and NLP, skilled in precise data labeling for LLM training

USA flagFL, Usa
$30.00/hrIntermediateAws SagemakerAnno MageCVAT

Key Skills

Software

AWS SageMakerAWS SageMaker
Anno-MageAnno-Mage
CVATCVAT
LabelboxLabelbox
Label StudioLabel Studio
OneFormaOneForma
ProdigyProdigy
RoboflowRoboflow
Scale AIScale AI
Snorkel AISnorkel AI
Surge AISurge AI

Top Subject Matter

Expert in data labeling, NLP, self-driving car imagery, and organizational culture.
#2 Industry/Subject Matter Expertise: Proficient in LLM evaluation, Python, and optimizing code generation workflows.
#3 Industry/Subject Matter Expertise: Skilled in satellite imagery analysis, data annotation, and training AI for geospatial tasks.

Top Data Types

Computer Code ProgrammingComputer Code Programming
ImageImage
TextText

Top Task Types

Computer Programming Coding
Entity Ner Classification
Prompt Response Writing SFT

Freelancer Overview

I bring over seven years of expertise in AI training data, data labeling, and NLP pipeline development, with a strong focus on Physics and Mathematics subject matter expertise. My experience spans designing and implementing annotation strategies, QA frameworks, and domain-specific evaluation rubrics for large language models (LLMs) and machine learning systems. I have worked extensively on text classification, sentiment analysis, named entity recognition, and mathematical/physics-based problem evaluation, ensuring high-quality datasets that enhance model performance and reliability. What sets me apart is the combination of deep technical knowledge in Python, NLP, and machine learning with proven capabilities in quality assurance, model output evaluation, and rubric-driven content refinement. I have led training initiatives for labeling teams, optimized annotation workflows to improve dataset accuracy, and collaborated with engineers to integrate labeled data into production-scale ML pipelines. With strong domain expertise in STEM and hands-on experience in LLM prompt engineering and QA auditing, I consistently deliver high-quality training data that drives model accuracy and trustworthiness.

IntermediateArabicFrenchEnglish

Labeling Experience

Label Studio

Physics & Mathematics QA and AI Model Evaluation

Label StudioTextText GenerationPrompt Response Writing SFT
Led multiple projects focused on evaluating and improving AI model outputs in Physics and Mathematics, ensuring technical correctness, conceptual clarity, and adherence to domain-specific guidelines. Designed and implemented QA rubrics and scoring frameworks for assessing LLM-generated solutions across problem-solving tasks, derivations, and applied reasoning. Developed and refined educational content, mathematical proofs, and physics explanations, embedding references and step-by-step examples to enhance clarity and learning outcomes. Conducted peer audits to maintain quality consistency and compliance with project standards. Delivered training workshops for engineers and analysts, covering QA methodologies, mathematical reasoning in NLP tasks, and model alignment evaluation. Integrated transformer-based models (BERT, GPT, and custom LLMs) into client systems, enabling STEM-focused automation and evaluation workflows. This work resulted in higher model accuracy.

Led multiple projects focused on evaluating and improving AI model outputs in Physics and Mathematics, ensuring technical correctness, conceptual clarity, and adherence to domain-specific guidelines. Designed and implemented QA rubrics and scoring frameworks for assessing LLM-generated solutions across problem-solving tasks, derivations, and applied reasoning. Developed and refined educational content, mathematical proofs, and physics explanations, embedding references and step-by-step examples to enhance clarity and learning outcomes. Conducted peer audits to maintain quality consistency and compliance with project standards. Delivered training workshops for engineers and analysts, covering QA methodologies, mathematical reasoning in NLP tasks, and model alignment evaluation. Integrated transformer-based models (BERT, GPT, and custom LLMs) into client systems, enabling STEM-focused automation and evaluation workflows. This work resulted in higher model accuracy.

2021
Roboflow

Computer Vision Labeling Specialist

RoboflowImageBounding BoxPolygon
Executed high-accuracy polygon segmentation in Roboflow for architectural floor-plan models, labeling 15–20+ regions per file with precise room-type classification. Interpreted architectural cues and layout structures across varied building formats. Conducted rigorous QA to ensure annotation integrity, producing a clean, reliable dataset that strengthened room-detection and occupancy-classification model performance.

Executed high-accuracy polygon segmentation in Roboflow for architectural floor-plan models, labeling 15–20+ regions per file with precise room-type classification. Interpreted architectural cues and layout structures across varied building formats. Conducted rigorous QA to ensure annotation integrity, producing a clean, reliable dataset that strengthened room-detection and occupancy-classification model performance.

2025 - 2025

AI Video Assessment Specialist

Don T DiscloseVideoEvaluation Rating
At RWS, I contributed to an AI-driven initiative designed to sharpen short-video recommendation systems. I evaluated paired social clips, comparing their clarity, pacing, audio balance, creativity and overall viewer appeal. By selecting the stronger clip and providing nuanced feedback, I helped refine how the model interprets engagement signals. This work improved content-ranking accuracy, strengthened relevance tuning and supported consistent, high-quality review operations across the project.

At RWS, I contributed to an AI-driven initiative designed to sharpen short-video recommendation systems. I evaluated paired social clips, comparing their clarity, pacing, audio balance, creativity and overall viewer appeal. By selecting the stronger clip and providing nuanced feedback, I helped refine how the model interprets engagement signals. This work improved content-ranking accuracy, strengthened relevance tuning and supported consistent, high-quality review operations across the project.

2025 - 2025
Scale AI

AI Financial Advisor Dataset Annotation Specialist

Scale AITextText GenerationEvaluation Rating
Contributed to the development of a text-based dataset powering an AI financial advisor designed to deliver personalized financial insights, budgeting recommendations and investment guidance. Responsibilities included annotating and validating financial conversations with multi-layered intent classification (spending analysis, budgeting support, credit inquiries), merchant and transaction categorization, and assistant response evaluation for tone, completeness and professionalism. Additionally performed function-call tagging for backend operations such as get_user_summary, get_transactions, and run_custom_sql, alongside SQL safety validation ensuring parameterization and read-only compliance. Produced structured JSON outputs aligning user queries, intents, and backend logic to enhance LLM grounding and contextual response generation. Ensured annotation accuracy through peer review cycles and internal QA benchmarking exceeding 98% precision.

Contributed to the development of a text-based dataset powering an AI financial advisor designed to deliver personalized financial insights, budgeting recommendations and investment guidance. Responsibilities included annotating and validating financial conversations with multi-layered intent classification (spending analysis, budgeting support, credit inquiries), merchant and transaction categorization, and assistant response evaluation for tone, completeness and professionalism. Additionally performed function-call tagging for backend operations such as get_user_summary, get_transactions, and run_custom_sql, alongside SQL safety validation ensuring parameterization and read-only compliance. Produced structured JSON outputs aligning user queries, intents, and backend logic to enhance LLM grounding and contextual response generation. Ensured annotation accuracy through peer review cycles and internal QA benchmarking exceeding 98% precision.

2024 - 2024

Education

M

Massachusetts Institute of Technology (MIT)

Bachelor of Science in Computer Science, Computer Science

Bachelor of Science in Computer Science
2010 - 2014

Work History

R

Rws

AI Video Assessment Specialist

Jacksoville
2025 - 2025
I

Invisible Technologies

AI Financial Data Annotation Specialist

Jacksonville
2024 - 2024