For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Elisabet Olivia Saraswati

Elisabet Olivia Saraswati

AI Data Labeling & LLM Evaluation Specialist | English & Indonesian

Indonesia flagBandung, Indonesia
$7.50/hrIntermediateAppenClickworkerCrowdsource

Key Skills

Software

AppenAppen
ClickworkerClickworker
CrowdSourceCrowdSource
HiveMindHiveMind
MindriftMindrift
OneFormaOneForma
RemotasksRemotasks
TolokaToloka
Other
Scale AIScale AI

Top Subject Matter

No subject matter listed

Top Data Types

AudioAudio
ImageImage
TextText

Top Task Types

Audio Recording
Evaluation Rating
Prompt Response Writing SFT
Translation Localization

Freelancer Overview

I have experience in AI training and data labeling, specializing in LLM evaluation, text classification, and text generation for English and Indonesian datasets. I ensure high-quality outputs through accurate annotation and careful linguistic review. With a background in localization and linguistic QA, I bring bilingual expertise to AI projects, helping models perform more accurately across languages and cultural contexts.

IntermediateEnglishJapaneseJavaneseSundaneseIndonesian

Labeling Experience

OneForma

Atlas Content Rater - Indonesian (Indonesia)

OneformaTextEvaluation Rating
This project focused on content evaluation and annotation of AI-generated outputs, requiring bilingual expertise in Indonesian and English. The core tasks included reviewing responses for accuracy, quality, and relevance, then writing detailed justifications in English to support evaluation decisions. The work demanded precision, consistency, and creative thinking to ensure reliable feedback that directly contributed to refining and improving AI model performance.

This project focused on content evaluation and annotation of AI-generated outputs, requiring bilingual expertise in Indonesian and English. The core tasks included reviewing responses for accuracy, quality, and relevance, then writing detailed justifications in English to support evaluation decisions. The work demanded precision, consistency, and creative thinking to ensure reliable feedback that directly contributed to refining and improving AI model performance.

2025

Factuality Project

OtherTextEvaluation Rating
This project focused on evaluating the factual accuracy of AI-generated responses by reviewing two outputs to the same user prompt and verifying their correctness against reliable web sources. The scope included determining task eligibility, assessing each response through guided questions, conducting research to fact-check claims, and identifying whether any inaccuracies could be misleading or harmful. Tasks also involved writing detailed justifications, providing side-by-side comparisons, and explaining preference choices. Conducted at scale across varied topics and contexts, the project required consistent application of factuality criteria, careful documentation, and adherence to quality measures such as thorough research, unbiased judgment, and precise written feedback to ensure reliable annotations that improve model trustworthiness.

This project focused on evaluating the factual accuracy of AI-generated responses by reviewing two outputs to the same user prompt and verifying their correctness against reliable web sources. The scope included determining task eligibility, assessing each response through guided questions, conducting research to fact-check claims, and identifying whether any inaccuracies could be misleading or harmful. Tasks also involved writing detailed justifications, providing side-by-side comparisons, and explaining preference choices. Conducted at scale across varied topics and contexts, the project required consistent application of factuality criteria, careful documentation, and adherence to quality measures such as thorough research, unbiased judgment, and precise written feedback to ensure reliable annotations that improve model trustworthiness.

2025 - 2025

Winter Wonderland Project

OtherTextPrompt Response Writing SFT
The project scope covered end-to-end prompt creation and evaluation across assigned categories and subcategories (by topic and locale, e.g., English and Indonesian) to expand model coverage; my tasks included writing prompts at specified complexity levels, reviewing two model responses per prompt, performing detailed proofreading (fluency, grammar, natural phrasing) and fact-checking (completeness, instruction-following, factual accuracy), rating and ranking responses with clear written justifications, and editing the selected response to improve fluency, correctness, and adherence to instructions. Executed at scale across multiple use-cases (brainstorming, classification, Q&A, creative writing) and locales, the work produced large volumes of labeled prompts, comparative scores, and improved outputs. Quality measures I followed included strict rubric consistency, thorough error analysis, cultural/linguistic sensitivity, precise justification for every rating, and verification of factua

The project scope covered end-to-end prompt creation and evaluation across assigned categories and subcategories (by topic and locale, e.g., English and Indonesian) to expand model coverage; my tasks included writing prompts at specified complexity levels, reviewing two model responses per prompt, performing detailed proofreading (fluency, grammar, natural phrasing) and fact-checking (completeness, instruction-following, factual accuracy), rating and ranking responses with clear written justifications, and editing the selected response to improve fluency, correctness, and adherence to instructions. Executed at scale across multiple use-cases (brainstorming, classification, Q&A, creative writing) and locales, the work produced large volumes of labeled prompts, comparative scores, and improved outputs. Quality measures I followed included strict rubric consistency, thorough error analysis, cultural/linguistic sensitivity, precise justification for every rating, and verification of factua

2025 - 2025

Cypher Eval Project

OtherTextEvaluation Rating
The Cypher Evals project involved evaluating and comparing outputs from two AI language models across diverse use cases such as brainstorming, classification, open Q&A, and creative writing. My tasks included reviewing prompts and responses, rating outputs on accuracy, fluency, coherence, and relevance, assigning preference rankings, and providing written justifications to highlight strengths and pinpoint errors. The project was conducted at scale across multiple locales, including English and Indonesian, ensuring cultural and linguistic accuracy. Throughout the process, I adhered to strict quality guidelines, maintaining consistency and detailed error analysis to deliver constructive feedback that contributed directly to model improvement.

The Cypher Evals project involved evaluating and comparing outputs from two AI language models across diverse use cases such as brainstorming, classification, open Q&A, and creative writing. My tasks included reviewing prompts and responses, rating outputs on accuracy, fluency, coherence, and relevance, assigning preference rankings, and providing written justifications to highlight strengths and pinpoint errors. The project was conducted at scale across multiple locales, including English and Indonesian, ensuring cultural and linguistic accuracy. Throughout the process, I adhered to strict quality guidelines, maintaining consistency and detailed error analysis to deliver constructive feedback that contributed directly to model improvement.

2025 - 2025
Appen

Spearmint Project

AppenTextEvaluation Rating
This project involved evaluating AI-generated responses across two dimensions: Tone and Fluency. In Batch 1, I assessed whether responses demonstrated qualities such as being helpful, insightful, engaging, and fair. In Batch 2, I reviewed outputs for grammatical accuracy, clarity, coherence, and natural flow. Conducted on a large set of responses, the work required consistent application of evaluation criteria, attention to linguistic detail, and adherence to quality standards to ensure reliable annotations that improved both the expressiveness and readability of the model’s outputs.

This project involved evaluating AI-generated responses across two dimensions: Tone and Fluency. In Batch 1, I assessed whether responses demonstrated qualities such as being helpful, insightful, engaging, and fair. In Batch 2, I reviewed outputs for grammatical accuracy, clarity, coherence, and natural flow. Conducted on a large set of responses, the work required consistent application of evaluation criteria, attention to linguistic detail, and adherence to quality standards to ensure reliable annotations that improved both the expressiveness and readability of the model’s outputs.

2024 - 2024

Education

M

Maranatha Christian University

Bachelor's Degree, Accounting

Bachelor's Degree
2015 - 2019

Work History

P

PT. Multibrata Anugerah Utama

Tax Accountant

Bandung
2019 - Present
M

Maranatha Christian University

Library Intern

Bandung
2016 - 2018