Sisi Wang - AI Data Analyst - Large Language Models

Key Skills

Software

Internal/Proprietary Tooling

Top Subject Matter

No subject matter listed

Top Data Types

Text

Top Task Types

Text Generation

RLHF

Computer Programming Coding

Data Collection

Prompt Response Writing SFT

Freelancer Overview

I am an AI data analyst with hands-on experience in building and optimizing high-quality datasets for training and fine-tuning large language models across domains such as finance, technology, and automotive safety. My work includes designing and implementing end-to-end data collection, cleaning, and annotation pipelines, as well as developing strict annotation guidelines that improved inter-annotator agreement and overall data consistency. I have led the construction of domain-specific datasets that significantly reduced model error rates, and I am skilled in prompt engineering, LLM evaluation, and automated metric analysis for NLP tasks. My technical toolkit includes Python, SQL, R, and platforms like AWS and Google Cloud, along with visualization tools such as Tableau and Power BI. I am passionate about using data-driven approaches to enhance AI model performance, ensure fairness, and deliver actionable insights for product and algorithm development.

IntermediateKoreanEnglishChinese Mandarin

Labeling Experience

Turing / Talents AI / Labelness

Internal Proprietary ToolingTextText GenerationRLHF

Led the development of specialized datasets for training and fine-tuning large language models (LLMs) across various domains, including finance and technology. Designed and implemented data collection, cleaning, and annotation pipelines to ensure data quality and consistency. Collaborated with cross-functional teams to define data requirements and quality standards. Developed and enforced strict annotation guidelines, resulting in a 30% improvement in interannotator agreement. Designed and executed comprehensive evaluation protocols to assess the performance, safety, and fairness of various LLMs. Developed a suite of automated evaluation scripts to measure metrics such as accuracy, coherence, and bias. Engineered and optimized prompts to enhance model performance on a wide range of NLP tasks, including text summarization, question answering, and code generation. Curated a library of over 500 effective prompt templates for internal use.

2025

Education

U

University of Missouri-Kansas City

Doctor of Philosophy, Computer Science

Doctor of Philosophy

2026 - 2031

M

Monash University

Master of Artificial Intelligence, Artificial Intelligence

Master of Artificial Intelligence

2023 - 2025

Work History

X

XPeng Motors

Data Analyst Intern

Shanghai

2025 - 2025

N

NIO

Data Analyst

Shanghai

2021 - 2023