Cagla Esiyok - "Skilled annotation expert with 4+ years in linguistic, image, localization

Key Skills

Software

Appen

Clickworker

CrowdFlower

CrowdSource

Labelbox

Label Studio

Lionbridge

OneForma

Scale AI

Telus

Other

Toloka

Top Subject Matter

Video and image labelling

LLM evalutation in English and Turkish

Search result image classification

Top Data Types

Image

Text

Video

Top Task Types

Action Recognition

Audio Recording

Evaluation Rating

Prompt Response Writing SFT

Translation Localization

Freelancer Overview

I have participated in several projects focused on data labeling and AI training data, where I gained hands-on experience in annotation, labeling, and transcription. My key responsibilities included ensuring data accuracy and language precision to deliver high-quality datasets to clients. I have worked extensively with large language models (LLMs) for AI, as well as in image annotation, speech recognition, tag labeling, and video categorization. Additionally, I have contributed to search result optimization, enhancing the relevance and accuracy of AI outputs. My diverse skill set, including proficiency in various annotation tools and a keen eye for detail, allows me to effectively manage and improve the quality of training data across multiple domains.

ExpertEnglishSpanishTurkishChinese Mandarin

Labeling Experience

SEO

TolokaDocumentEvaluation Rating

This SEO project involved evaluating queries and categorizing result types based on specific guidelines, focusing on providing Yandex result match data. Spanning approximately 600 hours , the project was carried out by three dedicated teams. Each team collaborated to ensure thorough analysis and accurate categorization, contributing to a comprehensive understanding of query performance and relevance. This structured approach aimed to optimize search engine results and enhance user experience through improved data matching.

2024 - 2024

Quality Evaluation

LabelboxImagePoint Key Point

The project focuses on quality evaluation for data labeling, involving a total of 420 hours of work. Two dedicated teams will collaborate to assess and improve the accuracy of labeled data across various datasets. The scope includes systematic evaluations of annotations, identifying inconsistencies, and providing feedback to enhance labeling quality. Regular reviews and quality checks will ensure that standards are met and maintained throughout the project. By leveraging the expertise of both teams, we aim to create a reliable and high-quality dataset that meets client expectations and supports effective machine learning applications.

2022 - 2022

AI training

AppenTextPrompt Response Writing SFT

The project focuses on developing a large language model (LLM) for Turkish language prompt-response tasks. The aim is to create a quality dataset that helps train the model to generate relevant and coherent responses in Turkish. We’ll work with approximately 260 hours of Turkish-language dialogues, covering various contexts like customer service, education, and casual conversations to capture diverse language use. Key activities include gathering video and audio sources featuring Turkish speakers, transcribing the dialogues, and labeling them with prompts and responses to ensure accuracy. To maintain high quality, we will conduct regular checks and peer reviews. Our quality standards target 95% accuracy in transcriptions and annotations, ensure diverse examples across different contexts and styles, and incorporate feedback from native speakers to continually improve the process. This structured approach will help create a robust dataset for training an effective Turkish LLM.

2021 - 2021

AI trainer

OtherTextPrompt Response Writing SFT

The project focused on evaluating prompts and AI responses for TCS, with a total workload of 2,000 hours. We worked with 20 batches each managed by different teams that collaborated to ensure data accuracy and met weekly deadlines. The evaluation process involved assessing AI answers against established metrics and refining responses or selected suggestions based on the results. Regular communication and feedback loops were implemented to maintain quality and consistency across batches. This structured approach aimed to enhance the performance of AI models while ensuring timely delivery of high-quality outputs.

2019 - 2020

Data labelling

OtherAudioText GenerationEvaluation Rating

This project focuses on the data labeling of video content to support text generation and evaluation tasks. The goal is to create a high-quality annotated dataset that facilitates the development and training of AI models for automatic transcription, summarization, and context understanding. Project Size:The project will involve annotating a substantial volume of video data, estimated at approximately 1200 hours of footage. Annotation Accuracy: Achieve a minimum of 95% accuracy in text transcription and labeling. Consistency Checks: Regular reviews and cross-validation among annotators to maintain uniformity in labeling conventions. Feedback Loops: Incorporate feedback from subject matter experts to refine annotation guidelines and improve quality over time.

2019 - 2019

Education

H

Harold Washington College

Associate Degree, Bussiness Administration

Associate Degree

2013 - 2015

Work History

T

Trinity Med

Executive Assistant

HongKong

2022 - 2024

I

isoftstone

Project Coordinator

Chicago

2021 - 2022