Lee Eurice - Expert LLM Evaluation and Prompt QA Specialist(En-Kr), 30+ yrs editorial ex

Key Skills

Software

Appen

Data Annotation Tech

Mercor

Other

Internal/Proprietary Tooling

Top Subject Matter

No subject matter listed

Top Data Types

Document

Image

Text

Top Task Types

Classification

Evaluation Rating

Prompt Response Writing SFT

Question Answering

RLHF

Freelancer Overview

I started my career as a bilingual journalist and translator and have spent more than three decades working around finance, economics, and technology. Much of my work involved explaining unfamiliar concepts clearly, checking facts, and maintaining accuracy under tight deadlines. These habits stayed with me, and they shape the way I approach AI evaluation today. Since 2024, I’ve been contributing to various AI training projects, including RLHF, prompt and response evaluation, safety reviews, and rubric-based scoring. I’ve worked with teams at TransPerfect, Appen, DataAnnotation, and Crowdgen, mainly on Korean–English evaluation tasks and model behavior analysis. Before entering the AI field, I published the bilingual economic lexicon used by The Korea Economic Daily (16,000+ terms) and spent years reporting on banking, markets, and policy at Reuters and The Korea Economic Daily. That background helps me handle tasks involving finance, reasoning, and factual consistency. I’m comfortable reviewing detailed guidelines, assessing sensitive or ambiguous content, and making judgment calls that require consistency and context. Whether I’m scoring model responses, analyzing reasoning steps, or rewriting prompts for clarity, I apply the same editorial discipline I’ve relied on throughout my career.

IntermediateKoreanEnglish

Labeling Experience

multimodal evaluation

Internal Proprietary ToolingImageClassificationRLHF

I reviewed paired image and text prompts to check whether descriptions matched the visual content and whether the model’s reasoning followed from what was shown. The tasks involved identifying mismatches, labeling incorrect interpretations and scoring the quality of short reasoning steps. I followed detailed instructions, applied them consistently and documented cases where images allowed for multiple reasonable readings. The work required careful visual attention and steady judgment across many items.

2025

Rubric-Based Chatbot Evaluation and Response Rating

Data Annotation TechTextQuestion AnsweringText Generation

Evaluated AI chatbot conversations using detailed rubrics to measure accuracy, coherence, instruction-following, and tone. Rated responses on factual correctness and conversational quality across multiple turns. Annotated both text-based and voice-transcribed dialogues to identify strengths, weaknesses, and potential risks. Contributed to reinforcement learning datasets by providing consistent ratings and feedback that improved AI model alignment with human expectations.

2025

Korean-English LLM Evaluation and Translation QA

Internal Proprietary ToolingTextText GenerationText Summarization

Reviewed AI-generated Korean-English translations of consulting and financial reports. Performed reinforcement learning with human feedback (RLHF) by evaluating, scoring, and improving prompt-response pairs. Delivered machine translation post-editing (MTPE), ensured terminology consistency, and flagged safety or factual issues. Work adhered to detailed quality guidelines, with continuous client feedback loops to maintain editorial precision.

2025

Medical / Wellness Reasoning Evaluation (Non-Clinical)

Internal Proprietary ToolingTextClassificationRLHF

I evaluated model responses to health and wellness questions, focusing on safety, clarity and reasoning accuracy. The tasks did not involve clinical judgment; they centered on identifying risky suggestions, checking for missing conditions and ensuring that answers followed established safety guidelines. I reviewed large batches of responses, flagged harmful patterns and provided short explanations for my scoring. The work required consistent judgment, attention to nuance and careful reading of safety instructions.

2025 - 2025

Hazardous Content Red Teaming and Safety Annotation

AppenTextClassificationRLHF

Conducted red teaming on potentially harmful and sensitive AI-generated outputs, including text prompts and meme-style content. Classified content for safety risks, bias, and harmful elements, and provided structured annotations for reinforcement learning datasets. Compared multiple AI responses and rated their compliance with safety and quality guidelines. Ensured consistent application of detailed instructions to support the development of safer and more reliable LLMs.

2025 - 2025

Education

S

School not specified

Bachelor of Arts, English Literature & Language

Bachelor of Arts

Not specified

Work History

K

KED Global

Senior Translator & Editor

Seoul

2022 - 2024

C

Cointelegraph Korea

Editor & Localization Specialist

Seoul

2019 - 2020