For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Siyu Wang

Siyu Wang

AI Coding Data Specialist - SFT & RLHF

CHINA flag
beijing, China
$20.00/hrIntermediateInternal Proprietary Tooling

Key Skills

Software

Internal/Proprietary Tooling

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code ProgrammingComputer Code Programming

Top Label Types

Computer Programming Coding
Fine Tuning
Prompt Response Writing SFT
RLHF

Freelancer Overview

I am a freelance data annotation specialist focused on AI Coding and STEM data, with a computer science background and proficiency in Python, JavaScript, C++, Java, and Go. My core expertise lies in preparing high-quality Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) data for large language models. I have extensive experience across diverse and advanced task queues, including but not limited to: SWE-Bench (resolving real-world GitHub issues by generating and applying patches), terminal-bench (annotating command-line interactions and shell script execution), code reasoning and logic tracing (mapping complex execution flows and identifying logical errors), and code refactoring and optimization (restructuring code for improved performance and maintainability without altering functionality). I deeply understand complex algorithmic logic and apply an engineering mindset to every task.

IntermediateEnglishChinese Mandarin

Labeling Experience

RL

Internal Proprietary ToolingComputer Code ProgrammingRLHFComputer Programming Coding
Project Overview: Reinforcement Learning from Human Feedback (RLHF) Data Annotation In RLHF projects, I contribute to the critical process of aligning AI model behavior with human values and preferences. My work directly impacts how models learn to distinguish between good, better, and best responses. Key aspects of my RLHF experience include: Preference Ranking: Comparing multiple model-generated responses and ranking them according to predefined criteria such as relevance, helpfulness, safety, and factual accuracy. Reward Modeling Support: Providing high-quality human preference data that serves as the foundation for training reward models, which in turn guide the reinforcement learning process. Edge Case Identification: Recognizing and flagging ambiguous, biased, or potentially harmful responses to help models avoid generating undesirable content. Consistency Calibration: Ensuring that preference judgments remain consistent across similar prompts, helping models develop stable and

Project Overview: Reinforcement Learning from Human Feedback (RLHF) Data Annotation In RLHF projects, I contribute to the critical process of aligning AI model behavior with human values and preferences. My work directly impacts how models learn to distinguish between good, better, and best responses. Key aspects of my RLHF experience include: Preference Ranking: Comparing multiple model-generated responses and ranking them according to predefined criteria such as relevance, helpfulness, safety, and factual accuracy. Reward Modeling Support: Providing high-quality human preference data that serves as the foundation for training reward models, which in turn guide the reinforcement learning process. Edge Case Identification: Recognizing and flagging ambiguous, biased, or potentially harmful responses to help models avoid generating undesirable content. Consistency Calibration: Ensuring that preference judgments remain consistent across similar prompts, helping models develop stable and

2024 - 2025

SFT

Internal Proprietary ToolingComputer Code ProgrammingFine Tuning
Project Overview: Supervised Fine-Tuning (SFT) Data Annotation In my SFT projects, I focus on building high-quality training datasets that fine-tune base language models to perform specific tasks with precision. The core of my work involves: Instruction Construction: Creating clear, diverse, and task-oriented instructions that reflect real user queries. Response Generation: Crafting accurate, well-structured, and contextually appropriate responses that serve as ideal model outputs. Quality Filtering: Reviewing and refining existing data to remove noise, correct errors, and ensure alignment with project guidelines. My SFT experience spans multiple domains, with a strong emphasis on AI coding and technical content. I have contributed to projects aimed at improving models' ability to generate executable code, debug complex programs, and explain technical concepts in simple terms. By combining my programming background with strict adherence to annotation standards, I ensure that every da

Project Overview: Supervised Fine-Tuning (SFT) Data Annotation In my SFT projects, I focus on building high-quality training datasets that fine-tune base language models to perform specific tasks with precision. The core of my work involves: Instruction Construction: Creating clear, diverse, and task-oriented instructions that reflect real user queries. Response Generation: Crafting accurate, well-structured, and contextually appropriate responses that serve as ideal model outputs. Quality Filtering: Reviewing and refining existing data to remove noise, correct errors, and ensure alignment with project guidelines. My SFT experience spans multiple domains, with a strong emphasis on AI coding and technical content. I have contributed to projects aimed at improving models' ability to generate executable code, debug complex programs, and explain technical concepts in simple terms. By combining my programming background with strict adherence to annotation standards, I ensure that every da

2024 - 2025

Education

山西应用科技学院

Bachelor of Science, Computer Science and Technology

Bachelor of Science
2016 - 2020

Work History

T

Tianxing Yuanjing

Java Developer

Beijing
2023 - 2024
B

Boyan Technology

Java Software Engineer

Beijing
2021 - 2023