For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
J

Johnny Ha

Senior Software Engineer

USA flag
Los Angeles, Usa
$90.00/hrExpert

Key Skills

Software

No software listed

Top Subject Matter

Artificial Intelligence & Machine Learning
Software Engineering & Computer Programming
Data Annotation & AI Model Training

Top Data Types

Computer Code ProgrammingComputer Code Programming
ImageImage
TextText

Top Task Types

Segmentation
Object Detection
Text Generation
Question Answering
Text Summarization
RLHF
Fine Tuning
Evaluation Rating
Computer Programming Coding
Data Collection
Function Calling
Prompt Response Writing SFT

Freelancer Overview

Senior Software Engineer with 10+ years of experience building scalable systems, optimizing data workflows, and delivering high-quality software across cloud-based environments. Proven track record of improving system performance (up to 40% latency reduction) and implementing robust CI/CD, testing, and data validation pipelines. Experienced in Python, Java, AWS, and microservices architecture, with a strong focus on data integrity, automation, and efficiency. In addition to software engineering, brings hands-on experience designing data processing pipelines, validation frameworks, and AI-related workflow automation tools. Adept at analyzing large datasets, creating labeling/validation rules, and ensuring high-quality training data. Known for strong attention to detail, cross-functional collaboration, and mentoring teams to maintain high standards in data quality and system reliability.

ExpertEnglish

Labeling Experience

AI Code Agent Evaluation & Preference Labeling (Anthropic / Mercor)

Computer Code ProgrammingRLHF
Evaluated and compared AI code agents (LLMs) on software engineering tasks across Python, Rust, Go, Java, TypeScript, and more. Provided detailed behavioral analysis, transcript-referenced feedback, and preference ratings to generate RLHF training data for frontier AI models. Maintained high quality standards with 90+ minute evaluations per task and 3+ behavioral observations per submission.

Evaluated and compared AI code agents (LLMs) on software engineering tasks across Python, Rust, Go, Java, TypeScript, and more. Provided detailed behavioral analysis, transcript-referenced feedback, and preference ratings to generate RLHF training data for frontier AI models. Maintained high quality standards with 90+ minute evaluations per task and 3+ behavioral observations per submission.

2026 - Present

Trajectory Annotation Writer – AI Reasoning & Code Review

Computer Code ProgrammingRLHF
Worked as a Trajectory Annotation Writer for a leading AI lab via Mercor Intelligence, annotating chain-of-thought reasoning in frontier model outputs on SWE-bench-style software engineering tasks. Applied domain expertise to reconstruct and refine internal reasoning traces, distinguishing informative decision points from redundant steps, and ensuring high-quality training signal for RLHF pipelines.

Worked as a Trajectory Annotation Writer for a leading AI lab via Mercor Intelligence, annotating chain-of-thought reasoning in frontier model outputs on SWE-bench-style software engineering tasks. Applied domain expertise to reconstruct and refine internal reasoning traces, distinguishing informative decision points from redundant steps, and ensuring high-quality training signal for RLHF pipelines.

2026 - Present

Software Engineering Task Evaluation & Golden Solution Development

Computer Code ProgrammingFine Tuning
Experienced in developing high-quality golden solutions and automated test cases for software engineering tasks targeting state-of-the-art AI model evaluation. Worked on industry-grade open-source codebases across Python, TypeScript, Java, and Go — analyzing GitHub PRs, implementing reference solutions following best practices, and authoring fail-to-pass test suites used to benchmark coding model performance. Proficient in Docker-based test environment setup, run script development, and trajectory analysis of failed model outputs to identify reasoning failures. Contributed to a scalable data pipeline used to improve SOTA model performance on complex SWE tasks.

Experienced in developing high-quality golden solutions and automated test cases for software engineering tasks targeting state-of-the-art AI model evaluation. Worked on industry-grade open-source codebases across Python, TypeScript, Java, and Go — analyzing GitHub PRs, implementing reference solutions following best practices, and authoring fail-to-pass test suites used to benchmark coding model performance. Proficient in Docker-based test environment setup, run script development, and trajectory analysis of failed model outputs to identify reasoning failures. Contributed to a scalable data pipeline used to improve SOTA model performance on complex SWE tasks.

2026 - Present

AI Response Evaluation & RLHF Annotation (Conversational LLM)

TextRLHF
Evaluated and provided structured feedback on AI-generated responses from large language models (LLMs), focusing on accuracy, helpfulness, safety, and instruction-following. Applied RLHF principles by comparing model outputs, rating response quality, and identifying areas for improvement across diverse prompts and domains including coding, writing, and reasoning tasks.

Evaluated and provided structured feedback on AI-generated responses from large language models (LLMs), focusing on accuracy, helpfulness, safety, and instruction-following. Applied RLHF principles by comparing model outputs, rating response quality, and identifying areas for improvement across diverse prompts and domains including coding, writing, and reasoning tasks.

2025 - Present

AI Model Response Evaluation and Quality Rating

TextEvaluation Rating
Experienced in evaluating and rating AI-generated responses through structured multi-axis assessment frameworks. Work involves reviewing agentic coding transcripts, identifying behavioral issues using standardized taxonomies such as instruction following failures, overengineering, and false claims of success, and producing detailed comparative rationale across response pairs. Skilled at detecting surface-level reasoning, circular logic, and verbosity issues in annotator feedback, while ensuring evaluation quality aligns with established acceptance and rejection criteria.

Experienced in evaluating and rating AI-generated responses through structured multi-axis assessment frameworks. Work involves reviewing agentic coding transcripts, identifying behavioral issues using standardized taxonomies such as instruction following failures, overengineering, and false claims of success, and producing detailed comparative rationale across response pairs. Skilled at detecting surface-level reasoning, circular logic, and verbosity issues in annotator feedback, while ensuring evaluation quality aligns with established acceptance and rejection criteria.

2025 - Present

Education

C

California State University, Los Angeles

Bachelor of Science, Computer Science

Bachelor of Science
2012 - 2016

Work History

U

Uber

Senior Software Engineer

Los Angeles
2024 - 2025
E

E-Commerce Business

Full-Stack Software Engineer & Co-Founder

Los Angeles
2020 - 2023