For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Chloe Bunda

Chloe Bunda

Expert in French and English AI Training

France flagFrance, France
$25.00/hrEntry LevelCrowdsource

Key Skills

Software

CrowdSourceCrowdSource

Top Subject Matter

LLM evaluation in French
Sheets Evaluation
Image generation Evaluation

Top Data Types

AudioAudio
ImageImage
TextText

Top Task Types

No task types listed

Freelancer Overview

At Outlier, I was responsible for rating responses generated from prompts, ensuring that the outputs met quality and accuracy standards. This task involved evaluating the relevance and coherence of the responses against predefined criteria, providing critical feedback for model improvement, and identifying areas where the prompts needed refinement. My role required a strong understanding of data annotation principles and a meticulous approach to assessing model performance. The data evaluated was different in every project; image generation, text, formulas..

Entry LevelFrenchEnglishSpanish

Labeling Experience

CrowdSource

Document Collection and Annotation Specialist

CrowdsourceDocumentEntity Ner ClassificationClassification
I worked on building a high-quality ground-truth dataset from real-world financial documents (PDF, XLS, and images). The project involved collecting and anonymizing ~10 heterogeneous client documents (account statements, portfolio summaries, loan/credit documents), then extracting key financial information such as assets, liabilities, account/holding details, balances, currencies, dates, and client identifiers. Using an internal annotation tool and a provided JSON schema, I converted unstructured documents into clean, structured JSON, focusing on consistency across files (field naming, units, currency codes, date formats). I implemented quality checks with custom Python scripts (JSON schema validation, missing/invalid fields detection) and maintained a tag/notes system to explicitly flag ambiguous, incomplete, or conflicting information. The final output was a set of document + JSON “correction” pairs designed to benchmark and regress-test LLMs.

I worked on building a high-quality ground-truth dataset from real-world financial documents (PDF, XLS, and images). The project involved collecting and anonymizing ~10 heterogeneous client documents (account statements, portfolio summaries, loan/credit documents), then extracting key financial information such as assets, liabilities, account/holding details, balances, currencies, dates, and client identifiers. Using an internal annotation tool and a provided JSON schema, I converted unstructured documents into clean, structured JSON, focusing on consistency across files (field naming, units, currency codes, date formats). I implemented quality checks with custom Python scripts (JSON schema validation, missing/invalid fields detection) and maintained a tag/notes system to explicitly flag ambiguous, incomplete, or conflicting information. The final output was a set of document + JSON “correction” pairs designed to benchmark and regress-test LLMs.

2025
CrowdSource

Image Evaluation

CrowdsourceImageEvaluation RatingPrompt Response Writing SFT
This project involves evaluating and comparing outputs generated from similar prompts that reference different entities. The goal is to assess the accuracy, consistency, and respectfulness in how these entities are portrayed, whether in generated images or text responses. The task ensures that the model's outputs align with the required quality and sensitivity standards.

This project involves evaluating and comparing outputs generated from similar prompts that reference different entities. The goal is to assess the accuracy, consistency, and respectfulness in how these entities are portrayed, whether in generated images or text responses. The task ensures that the model's outputs align with the required quality and sensitivity standards.

2024
CrowdSource

Clock Evaluation

CrowdsourceTextEvaluation RatingPrompt Response Writing SFT
This project focuses on verifying and correcting translated content related to prompts involving clocks, timers, stopwatches, and other time-related functions. The task involves reviewing an English prompt alongside its translation and, at times, additional fields like "thoughts," "existing timer/alarm query," "label," and "final response," each with translations. An empty "issues/assumptions" field is provided for reporting issues or noting assumptions made during translation. The objective is to ensure accurate and natural translations across all provided fields.

This project focuses on verifying and correcting translated content related to prompts involving clocks, timers, stopwatches, and other time-related functions. The task involves reviewing an English prompt alongside its translation and, at times, additional fields like "thoughts," "existing timer/alarm query," "label," and "final response," each with translations. An empty "issues/assumptions" field is provided for reporting issues or noting assumptions made during translation. The objective is to ensure accurate and natural translations across all provided fields.

2024
CrowdSource

Bardkick Workspace

CrowdsourceTextText GenerationEvaluation Rating
This project involves evaluating the quality of AI-generated responses to user prompts across various tasks. The evaluator is presented with a user prompt, sometimes with additional context, along with two AI-generated responses. The task is to assess each response based on specific quality dimensions, which may differ depending on the workstream. Examples of tasks include summarizing documents, generating formulas for data files, or composing emails. Each workstream has unique or common evaluation criteria to guide the quality assessment.

This project involves evaluating the quality of AI-generated responses to user prompts across various tasks. The evaluator is presented with a user prompt, sometimes with additional context, along with two AI-generated responses. The task is to assess each response based on specific quality dimensions, which may differ depending on the workstream. Examples of tasks include summarizing documents, generating formulas for data files, or composing emails. Each workstream has unique or common evaluation criteria to guide the quality assessment.

2024
CrowdSource

Cultural Relevance

CrowdsourceTextText GenerationTranslation Localization
In this project, I received a user prompt requesting an image generation and two AI-generated images. The task was to evaluate each image individually and provide a Side-by-Side rating to determine which image better meets the criteria. When assessing the images, the focus was on how effectively they capture and respectfully represent the cultural aspects of the specified country or region, ensuring that they accurately reflect the unique characteristics and cultural elements of the location.

In this project, I received a user prompt requesting an image generation and two AI-generated images. The task was to evaluate each image individually and provide a Side-by-Side rating to determine which image better meets the criteria. When assessing the images, the focus was on how effectively they capture and respectfully represent the cultural aspects of the specified country or region, ensuring that they accurately reflect the unique characteristics and cultural elements of the location.

2024

Education

E

EM Lyon Business School

Bachelor of Business Administration, Business Administration - English Track

Bachelor of Business Administration
2020 - 2024
E

EM Lyon Business School

Bachelor in Finance, Finance

Bachelor in Finance
2020 - 2024

Work History

B

BANK ABC

Permanent Controller

Paris
2023 - Present
P

Permanent Controller

Permanent Controller

Paris
2023 - 2024