For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Ma Xu

Ma Xu

LLM Evaluation and Text Generation Specialist in English & Chinese

Singapore flagSingapore, Singapore
$20.00/hrIntermediateScale AI

Key Skills

Software

Scale AIScale AI

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code ProgrammingComputer Code Programming
ImageImage
TextText

Top Task Types

Evaluation Rating
Prompt Response Writing SFT
RLHF
Text Generation
Text Summarization

Freelancer Overview

I have extensive experience in AI training and data labeling, with a focus on large language models (LLMs) and machine learning quality evaluation. As an LLM Training Specialist at Scale AI, I reviewed coding tasks across multiple programming languages, conducted prompt generation, evaluated AI-generated responses, created rubrics, and corrected code snippets. I performed detailed assessments of AI outputs for instruction adherence, factual accuracy, clarity, completeness, grammar, contextual appropriateness, and overall coherence. Additionally, I conducted side-by-side comparative analysis and provided written justifications to identify factors impacting response quality. Prior to that, as a Queue Manager, I oversaw contributor experience and retention, facilitated communication between contributors and project admins, and organized webinars to foster a collaborative community across SFT and RLHF projects. These experiences have equipped me with strong analytical skills, attention to detail, and a deep understanding of AI data workflows, ensuring high-quality training data and effective project delivery.

IntermediateEnglishChinese Mandarin

Labeling Experience

Scale AI

LLM Training Specialist

Scale AIAudioEvaluation RatingAudio Recording
In this project, I participated in a multi-person role-play AI training initiative. Tasks included conducting and recording dialogues across different topics, analyzing audio for sentiment, noise, clarity, and speech characteristics, and providing detailed feedback to ensure high-quality training data. The project involved collaboration with multiple contributors to produce consistent and reliable multimodal datasets for improving AI speech and conversation models.

In this project, I participated in a multi-person role-play AI training initiative. Tasks included conducting and recording dialogues across different topics, analyzing audio for sentiment, noise, clarity, and speech characteristics, and providing detailed feedback to ensure high-quality training data. The project involved collaboration with multiple contributors to produce consistent and reliable multimodal datasets for improving AI speech and conversation models.

2025 - 2025
Scale AI

LLM Training Specialist

Scale AIImageEvaluation RatingPrompt Response Writing SFT
In this project, I worked on multimodal AI training using images. Tasks included selecting appropriate images from galleries, writing image recognition prompts, performing sentiment analysis, generating natural-language prompts, comparing AI-generated responses, and analyzing outputs for accuracy, clarity, completeness, and contextual appropriateness. I provided detailed feedback and ensured high-quality instruction-following data to improve model performance.

In this project, I worked on multimodal AI training using images. Tasks included selecting appropriate images from galleries, writing image recognition prompts, performing sentiment analysis, generating natural-language prompts, comparing AI-generated responses, and analyzing outputs for accuracy, clarity, completeness, and contextual appropriateness. I provided detailed feedback and ensured high-quality instruction-following data to improve model performance.

2025 - 2025
Scale AI

LLM Training Specialist

Scale AIComputer Code ProgrammingFine TuningEvaluation Rating
In this project, I evaluated AI-generated code across multiple programming languages, including Python, Java, and JavaScript. My tasks included reviewing code correctness, creating rubrics, generating prompts, correcting code snippets, and comparing multiple AI outputs side by side. I provided detailed written feedback to identify errors, improve coding performance, and ensure high-quality data for training AI models.

In this project, I evaluated AI-generated code across multiple programming languages, including Python, Java, and JavaScript. My tasks included reviewing code correctness, creating rubrics, generating prompts, correcting code snippets, and comparing multiple AI outputs side by side. I provided detailed written feedback to identify errors, improve coding performance, and ensure high-quality data for training AI models.

2024 - 2025
Scale AI

LLM Training Specialist

Scale AITextRLHFEvaluation Rating
This project focused on evaluating AI-generated responses across multiple tasks, including OpenQA, ClosedQA, Rewriting, Classification, Extraction, Chatbot, Summarization, and Brainstorming. I reviewed model outputs for accuracy, completeness, clarity, grammar, and contextual appropriateness. I conducted side-by-side comparisons, provided detailed feedback and written justifications to identify errors or areas for improvement, and ensured high-quality annotations to enhance AI model performance.

This project focused on evaluating AI-generated responses across multiple tasks, including OpenQA, ClosedQA, Rewriting, Classification, Extraction, Chatbot, Summarization, and Brainstorming. I reviewed model outputs for accuracy, completeness, clarity, grammar, and contextual appropriateness. I conducted side-by-side comparisons, provided detailed feedback and written justifications to identify errors or areas for improvement, and ensured high-quality annotations to enhance AI model performance.

2024 - 2025

Education

C

Coventry University

Bachelor of Science, Computer Science

Bachelor of Science
2023 - 2024

Work History

S

Scale AI

Queue Manager

Singapore
2025 - 2025
S

Scale AI

AI Trainner

Singapore
2024 - 2025