For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Lee Eurice

Lee Eurice

Expert LLM Evaluation and Prompt QA Specialist(En-Kr), 30+ yrs editorial ex

South Korea flagGuri, South Korea
$35.00/hrIntermediateAppenData Annotation TechMercor

Key Skills

Software

AppenAppen
Data Annotation TechData Annotation Tech
MercorMercor
Other
Internal/Proprietary Tooling

Top Subject Matter

No subject matter listed

Top Data Types

DocumentDocument
ImageImage
TextText

Top Task Types

Classification
Evaluation Rating
Prompt Response Writing SFT
Question Answering
RLHF

Freelancer Overview

I started my career as a bilingual journalist and translator and have spent more than three decades working around finance, economics, and technology. Much of my work involved explaining unfamiliar concepts clearly, checking facts, and maintaining accuracy under tight deadlines. These habits stayed with me, and they shape the way I approach AI evaluation today. Since 2024, I’ve been contributing to various AI training projects, including RLHF, prompt and response evaluation, safety reviews, and rubric-based scoring. I’ve worked with teams at TransPerfect, Appen, DataAnnotation, and Crowdgen, mainly on Korean–English evaluation tasks and model behavior analysis. Before entering the AI field, I published the bilingual economic lexicon used by The Korea Economic Daily (16,000+ terms) and spent years reporting on banking, markets, and policy at Reuters and The Korea Economic Daily. That background helps me handle tasks involving finance, reasoning, and factual consistency. I’m comfortable reviewing detailed guidelines, assessing sensitive or ambiguous content, and making judgment calls that require consistency and context. Whether I’m scoring model responses, analyzing reasoning steps, or rewriting prompts for clarity, I apply the same editorial discipline I’ve relied on throughout my career.

IntermediateKoreanEnglish

Labeling Experience

multimodal evaluation

Internal Proprietary ToolingImageClassificationRLHF
I reviewed paired image and text prompts to check whether descriptions matched the visual content and whether the model’s reasoning followed from what was shown. The tasks involved identifying mismatches, labeling incorrect interpretations and scoring the quality of short reasoning steps. I followed detailed instructions, applied them consistently and documented cases where images allowed for multiple reasonable readings. The work required careful visual attention and steady judgment across many items.

I reviewed paired image and text prompts to check whether descriptions matched the visual content and whether the model’s reasoning followed from what was shown. The tasks involved identifying mismatches, labeling incorrect interpretations and scoring the quality of short reasoning steps. I followed detailed instructions, applied them consistently and documented cases where images allowed for multiple reasonable readings. The work required careful visual attention and steady judgment across many items.

2025
Data Annotation Tech

Rubric-Based Chatbot Evaluation and Response Rating

Data Annotation TechTextQuestion AnsweringText Generation
Evaluated AI chatbot conversations using detailed rubrics to measure accuracy, coherence, instruction-following, and tone. Rated responses on factual correctness and conversational quality across multiple turns. Annotated both text-based and voice-transcribed dialogues to identify strengths, weaknesses, and potential risks. Contributed to reinforcement learning datasets by providing consistent ratings and feedback that improved AI model alignment with human expectations.

Evaluated AI chatbot conversations using detailed rubrics to measure accuracy, coherence, instruction-following, and tone. Rated responses on factual correctness and conversational quality across multiple turns. Annotated both text-based and voice-transcribed dialogues to identify strengths, weaknesses, and potential risks. Contributed to reinforcement learning datasets by providing consistent ratings and feedback that improved AI model alignment with human expectations.

2025

Korean-English LLM Evaluation and Translation QA

Internal Proprietary ToolingTextText GenerationText Summarization
Reviewed AI-generated Korean-English translations of consulting and financial reports. Performed reinforcement learning with human feedback (RLHF) by evaluating, scoring, and improving prompt-response pairs. Delivered machine translation post-editing (MTPE), ensured terminology consistency, and flagged safety or factual issues. Work adhered to detailed quality guidelines, with continuous client feedback loops to maintain editorial precision.

Reviewed AI-generated Korean-English translations of consulting and financial reports. Performed reinforcement learning with human feedback (RLHF) by evaluating, scoring, and improving prompt-response pairs. Delivered machine translation post-editing (MTPE), ensured terminology consistency, and flagged safety or factual issues. Work adhered to detailed quality guidelines, with continuous client feedback loops to maintain editorial precision.

2025

Medical / Wellness Reasoning Evaluation (Non-Clinical)

Internal Proprietary ToolingTextClassificationRLHF
I evaluated model responses to health and wellness questions, focusing on safety, clarity and reasoning accuracy. The tasks did not involve clinical judgment; they centered on identifying risky suggestions, checking for missing conditions and ensuring that answers followed established safety guidelines. I reviewed large batches of responses, flagged harmful patterns and provided short explanations for my scoring. The work required consistent judgment, attention to nuance and careful reading of safety instructions.

I evaluated model responses to health and wellness questions, focusing on safety, clarity and reasoning accuracy. The tasks did not involve clinical judgment; they centered on identifying risky suggestions, checking for missing conditions and ensuring that answers followed established safety guidelines. I reviewed large batches of responses, flagged harmful patterns and provided short explanations for my scoring. The work required consistent judgment, attention to nuance and careful reading of safety instructions.

2025 - 2025
Appen

Hazardous Content Red Teaming and Safety Annotation

AppenTextClassificationRLHF
Conducted red teaming on potentially harmful and sensitive AI-generated outputs, including text prompts and meme-style content. Classified content for safety risks, bias, and harmful elements, and provided structured annotations for reinforcement learning datasets. Compared multiple AI responses and rated their compliance with safety and quality guidelines. Ensured consistent application of detailed instructions to support the development of safer and more reliable LLMs.

Conducted red teaming on potentially harmful and sensitive AI-generated outputs, including text prompts and meme-style content. Classified content for safety risks, bias, and harmful elements, and provided structured annotations for reinforcement learning datasets. Compared multiple AI responses and rated their compliance with safety and quality guidelines. Ensured consistent application of detailed instructions to support the development of safer and more reliable LLMs.

2025 - 2025

Education

S

School not specified

Bachelor of Arts, English Literature & Language

Bachelor of Arts
Not specified

Work History

K

KED Global

Senior Translator & Editor

Seoul
2022 - 2024
C

Cointelegraph Korea

Editor & Localization Specialist

Seoul
2019 - 2020