For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Chian Wen Tung

Chian Wen Tung

-"LLM Evaluation & Prompt Tuning Specialist in Chinese & English"

Taiwan flagNew Taipei City, Taiwan
$22.50/hrIntermediateScale AITelusInternal Proprietary Tooling

Key Skills

Software

Scale AIScale AI
TelusTelus
Internal/Proprietary Tooling

Top Subject Matter

No subject matter listed

Top Data Types

ImageImage
TextText
VideoVideo

Top Task Types

Data Collection
Diagnosis
Evaluation Rating
Prompt Response Writing SFT
RLHF

Freelancer Overview

I'm a Traditional Chinese content evaluator and prompt writer with hands-on experience in LLM evaluation, response comparison, and RLHF tasks. I've worked on projects involving chatbot fine-tuning, ranking model outputs, and identifying factual inconsistencies in AI-generated content. I'm especially comfortable working in both Traditional Chinese and English, and I care really much about language clarity, nuance, and cultural accuracy. My academic background in literature and training in language-related work has helped me develop a strong sense of tone, context, and quality. Whether it's writing prompts or evaluating subtle distinctions in AI responses, I focus on making sure the output feels natural, useful, and trustworthy for end users.

IntermediateGermanEnglishChinese Mandarin

Labeling Experience

Ads Quality Evaluation and Relevance Judgment for Search Engine Ads

Internal Proprietary ToolingTextClassificationDiagnosis
In this ongoing project, I evaluate the quality, relevance, and usefulness of sponsored ads shown in search engine results. My tasks involve assessing whether an ad matches the user's intent, judging the content’s compliance with localization and editorial guidelines (for Traditional Chinese), and classifying ad topics appropriately. I also provide diagnostic feedback on ad quality and user value. The project requires a high level of attention to detail and consistent performance across diverse content types and industries, including e-commerce, education, and finance.

In this ongoing project, I evaluate the quality, relevance, and usefulness of sponsored ads shown in search engine results. My tasks involve assessing whether an ad matches the user's intent, judging the content’s compliance with localization and editorial guidelines (for Traditional Chinese), and classifying ad topics appropriately. I also provide diagnostic feedback on ad quality and user value. The project requires a high level of attention to detail and consistent performance across diverse content types and industries, including e-commerce, education, and finance.

2024

Safety-focused Prompt Generation and Evaluation

Internal Proprietary ToolingTextEntity Ner ClassificationEvaluation Rating
In this project, I crafted and evaluated prompts categorized as benign, safe, harmful, and jailbreak. The goal was to help AI models distinguish between safe and potentially harmful or manipulative inputs. This involved carefully designing prompts that test model robustness and safety measures, contributing to improving model alignment and content moderation capabilities. Adhering strictly to client guidelines ensured consistent and reliable results.

In this project, I crafted and evaluated prompts categorized as benign, safe, harmful, and jailbreak. The goal was to help AI models distinguish between safe and potentially harmful or manipulative inputs. This involved carefully designing prompts that test model robustness and safety measures, contributing to improving model alignment and content moderation capabilities. Adhering strictly to client guidelines ensured consistent and reliable results.

2024

Evaluation and Rating for AI Model Responses

Internal Proprietary ToolingTextText GenerationEvaluation Rating
In this project, I evaluated AI-generated text responses based on client criteria, focusing on accuracy, relevance, and fluency. Additionally, I participated in model comparison tasks, assessing outputs from different AI models to help identify strengths and weaknesses. The ratings and feedback provided contributed to refining and improving large language models. Consistent guideline application and calibration sessions ensured evaluation quality and consistency.

In this project, I evaluated AI-generated text responses based on client criteria, focusing on accuracy, relevance, and fluency. Additionally, I participated in model comparison tasks, assessing outputs from different AI models to help identify strengths and weaknesses. The ratings and feedback provided contributed to refining and improving large language models. Consistent guideline application and calibration sessions ensured evaluation quality and consistency.

2024 - 2024

Prompt and Response Translation

Internal Proprietary ToolingTextTranslation LocalizationPrompt Response Writing SFT
In this project, I translated original source texts into English and then localized them into Chinese (Traditional). The work involved multiple rounds of prompt and response writing, where each response had to comply with specific client-defined rules and quality standards. This iterative process aimed to improve the accuracy, fluency, and cultural relevance of translations to enhance AI language model training.

In this project, I translated original source texts into English and then localized them into Chinese (Traditional). The work involved multiple rounds of prompt and response writing, where each response had to comply with specific client-defined rules and quality standards. This iterative process aimed to improve the accuracy, fluency, and cultural relevance of translations to enhance AI language model training.

2024 - 2024

Reinforcement Learning with Human Feedback (RLHF) for LLM Training

Internal Proprietary ToolingTextRLHFEvaluation Rating
I participated in the RLHF process by evaluating and ranking model-generated responses according to quality, relevance, and safety standards. My role involved providing detailed feedback on outputs, helping guide the model’s reinforcement learning to improve performance and alignment with human values. Additionally, I contributed to prompt and response writing to support fine-tuning efforts. Throughout the project, I followed strict guidelines to ensure consistent and unbiased evaluations, which helped optimize model behavior and user experience.

I participated in the RLHF process by evaluating and ranking model-generated responses according to quality, relevance, and safety standards. My role involved providing detailed feedback on outputs, helping guide the model’s reinforcement learning to improve performance and alignment with human values. Additionally, I contributed to prompt and response writing to support fine-tuning efforts. Throughout the project, I followed strict guidelines to ensure consistent and unbiased evaluations, which helped optimize model behavior and user experience.

2024 - 2024

Education

F

Fu-Jen Catholic University

Undergraduate, Sport Management

Undergraduate
2022 - 2025
S

Soochow University

Undergraduate, German Language and Culture

Undergraduate
2020 - 2024

Work History

T

Telus International AI

Ads Quality Rater

Remote
2024 - Present
O

Outlier AI

AI Task Evaluator & Prompt Engineer

Remote
2024 - Present