For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Juhee Bae

Juhee Bae

LLM Evaluation and Text Generation Specialist in Korean

South Korea flagSeoul, South Korea
$16.00/hrIntermediateAppenCVATOneforma

Key Skills

Software

AppenAppen
CVATCVAT
OneFormaOneForma
TelusTelus
Other
Internal/Proprietary Tooling

Top Subject Matter

LLM evaluation in Korean
prompt generation in Korean
image classification

Top Data Types

AudioAudio
DocumentDocument
ImageImage

Top Task Types

Evaluation Rating
Fine Tuning
Polygon
Prompt Response Writing SFT
Text Generation

Freelancer Overview

With over two years of hands-on experience in AI training data and evaluation, I have built a strong foundation in both data labeling project management and large language model (LLM) evaluation. As an Assistant Project Manager on multiple AI projects, I oversaw large-scale image and text data labeling efforts, including supervising up to 70 labelers, managing and verifying metadata, and conducting quality control on bounding box and polygon annotations. I played a key role in ensuring accurate and consistent data output for AI systems in industrial safety and job relevance prediction tasks. In addition to project management, I have worked extensively as an LLM Evaluator and content writer. For over 18 months, I evaluated LLM outputs across dimensions such as instruction following, accuracy, and harmfulness, providing detailed feedback and annotations. I also designed adversarial prompts and revised model responses to train and fine-tune models effectively. My recent work as a voice-based LLM evaluator further demonstrates my ability to assess model performance in multimodal environments, focusing on clarity, collaboration, and natural delivery in spoken contexts.

IntermediateKoreanEnglish

Labeling Experience

Evaluator

Internal Proprietary ToolingTextPrompt Response Writing SFT
This project was designed to test the robustness of large language models by crafting prompts that could expose failure points across dimensions such as reasoning, safety, or compliance. The task also involved evaluating and rewriting model responses to meet ideal output standards. Labeling work included categorizing prompt intent, diagnosing response flaws, and revising text according to predefined quality criteria. The project generated and reviewed thousands of prompt-response sets, with quality ensured through peer review processes and alignment with high-precision annotation guidelines.

This project was designed to test the robustness of large language models by crafting prompts that could expose failure points across dimensions such as reasoning, safety, or compliance. The task also involved evaluating and rewriting model responses to meet ideal output standards. Labeling work included categorizing prompt intent, diagnosing response flaws, and revising text according to predefined quality criteria. The project generated and reviewed thousands of prompt-response sets, with quality ensured through peer review processes and alignment with high-precision annotation guidelines.

2024
Appen

Transcriptor

AppenAudioSegmentationText Generation
This project involved transcribing short Korean audio clips and segmenting speaker gender. Over the course of one month, a dataset was built by accurately converting spoken language into text and labeling each segment with the appropriate gender classification. The project aimed to enhance speech recognition model training, with a strong focus on transcription accuracy and consistent speaker segmentation.

This project involved transcribing short Korean audio clips and segmenting speaker gender. Over the course of one month, a dataset was built by accurately converting spoken language into text and labeling each segment with the appropriate gender classification. The project aimed to enhance speech recognition model training, with a strong focus on transcription accuracy and consistent speaker segmentation.

2024 - 2024

Assistant project manager

Internal Proprietary ToolingTextSegmentation
This project aimed to train an AI system to classify how relevant interview questions are to various occupations. The dataset comprised pairs of interview questions and occupational labels, which required careful semantic judgment to annotate. Labeling tasks included binary or multi-class classification, metadata tagging, and verification of contextual relevance. The project processed 8,000 data pairs and involved multiple annotators working under tight alignment protocols. Quality control relied on detailed labeling guidelines, regular validation rounds, and inter-annotator agreement checks.

This project aimed to train an AI system to classify how relevant interview questions are to various occupations. The dataset comprised pairs of interview questions and occupational labels, which required careful semantic judgment to annotate. Labeling tasks included binary or multi-class classification, metadata tagging, and verification of contextual relevance. The project processed 8,000 data pairs and involved multiple annotators working under tight alignment protocols. Quality control relied on detailed labeling guidelines, regular validation rounds, and inter-annotator agreement checks.

2024 - 2024

LLM Evaluator

OtherTextEvaluation Rating
I assessed outputs across dimensions including instruction following, language appropriateness, factual accuracy, potential harm, and writing style. I worked individually but in alignment with a standardized rubric and review protocol.

I assessed outputs across dimensions including instruction following, language appropriateness, factual accuracy, potential harm, and writing style. I worked individually but in alignment with a standardized rubric and review protocol.

2024 - 2024
CVAT

Assistant Project Manager

CVATImageBounding BoxPolygon
This project focused on developing a computer vision model capable of identifying and classifying safety-related incidents within smart factory environments. Approximately 20,000 images were collected from industrial sites to create a dataset reflecting various accident scenarios, worker behaviors, and machinery conditions. The labeling tasks involved annotating objects and events using bounding boxes and polygonal segmentation, targeting elements such as workers, tools, and hazardous zones. The project required managing annotations on a large scale with contributions from over 70 annotators. To ensure high data quality, a multi-step validation process was employed, including guideline-based reviews, regular spot checks, and feedback loops to maintain consistency and accuracy across the labeled dataset.

This project focused on developing a computer vision model capable of identifying and classifying safety-related incidents within smart factory environments. Approximately 20,000 images were collected from industrial sites to create a dataset reflecting various accident scenarios, worker behaviors, and machinery conditions. The labeling tasks involved annotating objects and events using bounding boxes and polygonal segmentation, targeting elements such as workers, tools, and hazardous zones. The project required managing annotations on a large scale with contributions from over 70 annotators. To ensure high data quality, a multi-step validation process was employed, including guideline-based reviews, regular spot checks, and feedback loops to maintain consistency and accuracy across the labeled dataset.

2023 - 2023

Education

C

Chungnam National University

Bachelor's in Economics, Economics

Bachelor's in Economics
2014 - 2019

Work History

O

OneForma

Freelance Data Annotator

Seoul
2024 - Present
R

RWS

Freelance LLM Evaluator

Seoul
2023 - Present