Joseph Yokobori - Evaluated and Refined LLM-generated text in Japanese & English

Key Skills

Software

Data Annotation Tech

Scale AI

Other

Top Subject Matter

No subject matter listed

Top Data Types

Audio

Image

Text

Top Task Types

Classification

Prompt Response Writing SFT

Translation Localization

Freelancer Overview

I have worked as a freelance AI evaluator for over four months, contributing more than 200 hours to multiple projects. My work focused on reviewing and improving LLM-generated responses in both Japanese and English. My responsibilities included evaluating output quality, translating responses, and editing to enhance fluency and contextual accuracy. I also handled categorization tasks, such as identifying adversarial prompts and assessing potential risks. Throughout these roles, I consistently met the quality and speed standards while adapting to increasingly complex tasks, including beta-phase projects requiring fair and nuanced judgment. In addition, I helped design fine-grained criteria for a range of response types and used them to refine model outputs. This involved identifying strengths and weaknesses in LLM-generated content, proposing concrete improvements, and ensuring alignment with the user intent. I also conducted evaluations in STEM-related fields, particularly biology, drawing on my academic background to assess technical accuracy and clarity. My experience reflects not only my strong linguistic skills but also structured analytical thinking and a deep understanding of how human feedback shapes language model development.

Entry LevelEnglishJapaneseChinese Mandarin

Labeling Experience

Cypher

Scale AITextTranslation LocalizationEvaluation Rating

Contributed to Cypher_Evals model evaluations by comparing pairs of LLM-generated responses. Each pair was assessed based on detailed criteria, including instruction following, truthfulness, localization, and verbosity. In addition to rating individual aspects, I provided an overall judgement on which response was better and explained the reasoning behind my choice. These tasks were completed under strict time and quality constraints, requiring both precision and efficacy.

2025

Creation of Fine-Grained Criteria

Data Annotation TechTextClassificationTranslation Localization

Created fine-grained evaluation criteria to assess the quality of LLM-generated responses, particularly in tasks requiring nuanced judgement, including both general and STEM-related prompts. These criteria were used to distinguish between acceptable and perfect outputs, guide revisions, and support the development of creating ideal responses.

2025

Education

P

Peking University

Emersion Program, Chinese Language

Emersion Program

2023 - 2023

W

Waseda University

Bachelor Of Science, Life Science And Medical Bioscience

Bachelor Of Science

2022

Work History

U

University of Kansas Medical Center

Research Intern

Kansas

2024 - 2024