Victor Tang - AI Specialist in LLM Evaluation, Training, RLHF, and Prompt Engineering

Key Skills

Software

CloudFactory

CVAT

Data Annotation Tech

Dataloop

Labelbox

Scale AI

SuperAnnotate

V7 Labs

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code Programming

Image

Text

Top Task Types

Computer Programming Coding

Evaluation Rating

Fine Tuning

RLHF

Text Generation

Freelancer Overview

I am an experienced AI professional specializing in Large Language Model (LLM) evaluation, training, and reinforcement learning with human feedback (RLHF). I led the RLHF process in the development of the Cypher LLM model, overseeing critical stages such as response evaluation, fine-tuning, and feedback integration to optimize the model's performance. My expertise extends to designing effective workflows that ensure high-quality data labeling and iterative model improvement. Additionally, I contributed to the Phoenix Pickle project, where I applied similar RLHF methodologies to enhance response evaluation and model fine-tuning. My hands-on experience with these advanced AI processes has equipped me with a deep understanding of prompt engineering, fine-tuning strategies, and the intricate dynamics of human-AI collaboration. I am committed to leveraging these skills to build cutting-edge AI systems that meet real-world needs. I've also been involved in labeling coding data for model fine-tuning and evaluation. This includes ranking multiple model-generated code responses to a given prompt based on correctness, efficiency, and readability—an essential component of training a reward model in RLHF. I’ve annotated outputs by flagging incorrect logic, suboptimal implementations, or policy violations (e.g., unsafe code practices), helping improve both the model’s reward function and post-processing filters.

IntermediateEnglishJapanese

Labeling Experience

Phoenix Pickle - RLHF, Fine Tuning, and Prompt Engineer

Data Annotation TechTextText GenerationRLHF

This project involved the development and refinement of a cutting-edge language model through a meticulous process combining reinforcement learning with human feedback (RLHF), prompt engineering, and fine-tuning. My role encompassed training the model by evaluating its responses to given prompts, assessing their quality, and providing targeted feedback to guide improvements. Using reinforcement learning techniques, I iteratively fine-tuned the model to achieve desired outcomes, optimizing its performance and ensuring alignment with project goals. This work required a deep understanding of LLM dynamics, response evaluation, and iterative training methodologies.

2024

Cypher LLM - RLHF, Fine-tuning, and Prompt Engineer

Data Annotation TechTextText GenerationRLHF

This project involved the development and refinement of a cutting-edge language model through a meticulous process combining reinforcement learning with human feedback (RLHF), prompt engineering, and fine-tuning. My role encompassed training the model by evaluating its responses to given prompts, assessing their quality, and providing targeted feedback to guide improvements. Using reinforcement learning techniques, I iteratively fine-tuned the model to achieve desired outcomes, optimizing its performance and ensuring alignment with project goals. This work required a deep understanding of LLM dynamics, response evaluation, and iterative training methodologies.

2024

Code Labeling RLHF

LabelboxComputer Code ProgrammingRLHF

Labeling coding data for model fine-tuning and evaluation. Ranking multiple model-generated code responses to a given prompt based on correctness, efficiency, and readability. Helped optimize code outputs by providing feedback on generated Python scripts, labeled examples for supervised fine-tuning and also participated in preference comparisons for reinforcement learning steps

2024 - 2024

Education

A

Australian National University

Bachelors of Advanced Computing (Research and Development, AI Specialization with Honors), Computer Science

Bachelors of Advanced Computing (Research and Development, AI Specialization with Honors)

2021 - 2024

Work History

K

KIS Education

Academic Tutor

Melbourne

2023 - Present