Elisabet Olivia Saraswati - AI Data Labeling & LLM Evaluation Specialist | English & Indonesian

Key Skills

Software

Appen

Clickworker

CrowdSource

HiveMind

Mindrift

OneForma

Remotasks

Toloka

Other

Scale AI

Top Subject Matter

No subject matter listed

Top Data Types

Audio

Image

Text

Top Task Types

Audio Recording

Evaluation Rating

Prompt Response Writing SFT

Translation Localization

Freelancer Overview

I have experience in AI training and data labeling, specializing in LLM evaluation, text classification, and text generation for English and Indonesian datasets. I ensure high-quality outputs through accurate annotation and careful linguistic review. With a background in localization and linguistic QA, I bring bilingual expertise to AI projects, helping models perform more accurately across languages and cultural contexts.

IntermediateEnglishJapaneseJavaneseSundaneseIndonesian

Labeling Experience

Atlas Content Rater - Indonesian (Indonesia)

OneformaTextEvaluation Rating

This project focused on content evaluation and annotation of AI-generated outputs, requiring bilingual expertise in Indonesian and English. The core tasks included reviewing responses for accuracy, quality, and relevance, then writing detailed justifications in English to support evaluation decisions. The work demanded precision, consistency, and creative thinking to ensure reliable feedback that directly contributed to refining and improving AI model performance.

2025

Factuality Project

OtherTextEvaluation Rating

This project focused on evaluating the factual accuracy of AI-generated responses by reviewing two outputs to the same user prompt and verifying their correctness against reliable web sources. The scope included determining task eligibility, assessing each response through guided questions, conducting research to fact-check claims, and identifying whether any inaccuracies could be misleading or harmful. Tasks also involved writing detailed justifications, providing side-by-side comparisons, and explaining preference choices. Conducted at scale across varied topics and contexts, the project required consistent application of factuality criteria, careful documentation, and adherence to quality measures such as thorough research, unbiased judgment, and precise written feedback to ensure reliable annotations that improve model trustworthiness.

2025 - 2025

Winter Wonderland Project

OtherTextPrompt Response Writing SFT

The project scope covered end-to-end prompt creation and evaluation across assigned categories and subcategories (by topic and locale, e.g., English and Indonesian) to expand model coverage; my tasks included writing prompts at specified complexity levels, reviewing two model responses per prompt, performing detailed proofreading (fluency, grammar, natural phrasing) and fact-checking (completeness, instruction-following, factual accuracy), rating and ranking responses with clear written justifications, and editing the selected response to improve fluency, correctness, and adherence to instructions. Executed at scale across multiple use-cases (brainstorming, classification, Q&A, creative writing) and locales, the work produced large volumes of labeled prompts, comparative scores, and improved outputs. Quality measures I followed included strict rubric consistency, thorough error analysis, cultural/linguistic sensitivity, precise justification for every rating, and verification of factua

2025 - 2025

Cypher Eval Project

OtherTextEvaluation Rating

The Cypher Evals project involved evaluating and comparing outputs from two AI language models across diverse use cases such as brainstorming, classification, open Q&A, and creative writing. My tasks included reviewing prompts and responses, rating outputs on accuracy, fluency, coherence, and relevance, assigning preference rankings, and providing written justifications to highlight strengths and pinpoint errors. The project was conducted at scale across multiple locales, including English and Indonesian, ensuring cultural and linguistic accuracy. Throughout the process, I adhered to strict quality guidelines, maintaining consistency and detailed error analysis to deliver constructive feedback that contributed directly to model improvement.

2025 - 2025

Spearmint Project

AppenTextEvaluation Rating

This project involved evaluating AI-generated responses across two dimensions: Tone and Fluency. In Batch 1, I assessed whether responses demonstrated qualities such as being helpful, insightful, engaging, and fair. In Batch 2, I reviewed outputs for grammatical accuracy, clarity, coherence, and natural flow. Conducted on a large set of responses, the work required consistent application of evaluation criteria, attention to linguistic detail, and adherence to quality standards to ensure reliable annotations that improved both the expressiveness and readability of the model’s outputs.

2024 - 2024

Education

M

Maranatha Christian University

Bachelor's Degree, Accounting

Bachelor's Degree

2015 - 2019

Work History

P

PT. Multibrata Anugerah Utama

Tax Accountant

Bandung

2019 - Present

M

Maranatha Christian University

Library Intern

Bandung

2016 - 2018