For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Victor Tang

Victor Tang

AI Specialist in LLM Evaluation, Training, RLHF, and Prompt Engineering

Australia flagVermont South, Australia
$60.00/hrIntermediateCloudfactoryCVATData Annotation Tech

Key Skills

Software

CloudFactoryCloudFactory
CVATCVAT
Data Annotation TechData Annotation Tech
DataloopDataloop
LabelboxLabelbox
Scale AIScale AI
SuperAnnotateSuperAnnotate
V7 LabsV7 Labs

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code ProgrammingComputer Code Programming
ImageImage
TextText

Top Task Types

Computer Programming Coding
Evaluation Rating
Fine Tuning
RLHF
Text Generation

Freelancer Overview

I am an experienced AI professional specializing in Large Language Model (LLM) evaluation, training, and reinforcement learning with human feedback (RLHF). I led the RLHF process in the development of the Cypher LLM model, overseeing critical stages such as response evaluation, fine-tuning, and feedback integration to optimize the model's performance. My expertise extends to designing effective workflows that ensure high-quality data labeling and iterative model improvement. Additionally, I contributed to the Phoenix Pickle project, where I applied similar RLHF methodologies to enhance response evaluation and model fine-tuning. My hands-on experience with these advanced AI processes has equipped me with a deep understanding of prompt engineering, fine-tuning strategies, and the intricate dynamics of human-AI collaboration. I am committed to leveraging these skills to build cutting-edge AI systems that meet real-world needs. I've also been involved in labeling coding data for model fine-tuning and evaluation. This includes ranking multiple model-generated code responses to a given prompt based on correctness, efficiency, and readability—an essential component of training a reward model in RLHF. I’ve annotated outputs by flagging incorrect logic, suboptimal implementations, or policy violations (e.g., unsafe code practices), helping improve both the model’s reward function and post-processing filters.

IntermediateEnglishJapanese

Labeling Experience

Data Annotation Tech

Phoenix Pickle - RLHF, Fine Tuning, and Prompt Engineer

Data Annotation TechTextText GenerationRLHF
This project involved the development and refinement of a cutting-edge language model through a meticulous process combining reinforcement learning with human feedback (RLHF), prompt engineering, and fine-tuning. My role encompassed training the model by evaluating its responses to given prompts, assessing their quality, and providing targeted feedback to guide improvements. Using reinforcement learning techniques, I iteratively fine-tuned the model to achieve desired outcomes, optimizing its performance and ensuring alignment with project goals. This work required a deep understanding of LLM dynamics, response evaluation, and iterative training methodologies.

This project involved the development and refinement of a cutting-edge language model through a meticulous process combining reinforcement learning with human feedback (RLHF), prompt engineering, and fine-tuning. My role encompassed training the model by evaluating its responses to given prompts, assessing their quality, and providing targeted feedback to guide improvements. Using reinforcement learning techniques, I iteratively fine-tuned the model to achieve desired outcomes, optimizing its performance and ensuring alignment with project goals. This work required a deep understanding of LLM dynamics, response evaluation, and iterative training methodologies.

2024
Data Annotation Tech

Cypher LLM - RLHF, Fine-tuning, and Prompt Engineer

Data Annotation TechTextText GenerationRLHF
This project involved the development and refinement of a cutting-edge language model through a meticulous process combining reinforcement learning with human feedback (RLHF), prompt engineering, and fine-tuning. My role encompassed training the model by evaluating its responses to given prompts, assessing their quality, and providing targeted feedback to guide improvements. Using reinforcement learning techniques, I iteratively fine-tuned the model to achieve desired outcomes, optimizing its performance and ensuring alignment with project goals. This work required a deep understanding of LLM dynamics, response evaluation, and iterative training methodologies.

This project involved the development and refinement of a cutting-edge language model through a meticulous process combining reinforcement learning with human feedback (RLHF), prompt engineering, and fine-tuning. My role encompassed training the model by evaluating its responses to given prompts, assessing their quality, and providing targeted feedback to guide improvements. Using reinforcement learning techniques, I iteratively fine-tuned the model to achieve desired outcomes, optimizing its performance and ensuring alignment with project goals. This work required a deep understanding of LLM dynamics, response evaluation, and iterative training methodologies.

2024
Labelbox

Code Labeling RLHF

LabelboxComputer Code ProgrammingRLHF
Labeling coding data for model fine-tuning and evaluation. Ranking multiple model-generated code responses to a given prompt based on correctness, efficiency, and readability. Helped optimize code outputs by providing feedback on generated Python scripts, labeled examples for supervised fine-tuning and also participated in preference comparisons for reinforcement learning steps

Labeling coding data for model fine-tuning and evaluation. Ranking multiple model-generated code responses to a given prompt based on correctness, efficiency, and readability. Helped optimize code outputs by providing feedback on generated Python scripts, labeled examples for supervised fine-tuning and also participated in preference comparisons for reinforcement learning steps

2024 - 2024

Education

A

Australian National University

Bachelors of Advanced Computing (Research and Development, AI Specialization with Honors), Computer Science

Bachelors of Advanced Computing (Research and Development, AI Specialization with Honors)
2021 - 2024

Work History

K

KIS Education

Academic Tutor

Melbourne
2023 - Present