For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Shun Osawa

Shun Osawa

LLM Evaluation & AI Safety Specialist (Preference Ranking, QA)

Japan flagTokyo, Japan
$21.00/hrIntermediateAppenOneforma

Key Skills

Software

AppenAppen
OneFormaOneForma

Top Subject Matter

No subject matter listed

Top Data Types

AudioAudio
ImageImage
TextText

Top Task Types

Evaluation Rating
Point Key Point
Translation Localization

Freelancer Overview

I am an experienced LLM evaluator specializing in response evaluation, preference ranking, and AI safety assessment. I have worked extensively with detailed guidelines to assess model outputs for quality, accuracy, helpfulness, and policy compliance. My expertise includes evaluating hallucinations, harmful content, instruction-following, and comparative response ranking. I am highly detail-oriented and consistent, with a strong focus on objective, guideline-based judgment rather than subjective opinion. I also have experience working in bilingual (Japanese–English) environments, supporting text evaluation and localization quality assurance for global AI products. I am reliable, deadline-driven, and comfortable handling large-scale evaluation tasks requiring high precision and quality standards.

IntermediateEnglishJapanese

Labeling Experience

OneForma

LLM Response Evaluation & Preference Ranking

OneformaTextTranslation LocalizationEvaluation Rating
Evaluated large language model responses based on detailed project guidelines, focusing on accuracy, relevance, helpfulness, and instruction-following. Performed preference ranking between multiple responses and provided structured, guideline-aligned justifications for each decision. Reviewed edge cases involving partial correctness, ambiguity, hallucinations, and policy-sensitive content. Ensured consistent and objective evaluations while maintaining high quality standards across large task volumes. Collaborated within a quality-controlled evaluation environment emphasizing reliability and precision.

Evaluated large language model responses based on detailed project guidelines, focusing on accuracy, relevance, helpfulness, and instruction-following. Performed preference ranking between multiple responses and provided structured, guideline-aligned justifications for each decision. Reviewed edge cases involving partial correctness, ambiguity, hallucinations, and policy-sensitive content. Ensured consistent and objective evaluations while maintaining high quality standards across large task volumes. Collaborated within a quality-controlled evaluation environment emphasizing reliability and precision.

2023 - 2025

Education

明治大学

Bachelor of Arts, English and American Literature

Bachelor of Arts
2010 - 2014

Work History

F

Freelance

Freelance Translator

N/A
2012 - Present