For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Andy Fransisko

Andy Fransisko

LLM Code Quality, Evaluation, and Reasoning Specialist

INDONESIA flag
Jakarta, Indonesia
$30.00/hrIntermediateLabelboxMindriftToloka

Key Skills

Software

LabelboxLabelbox
MindriftMindrift
TolokaToloka

Top Subject Matter

No subject matter listed

Top Data Types

AudioAudio
Computer Code ProgrammingComputer Code Programming

Top Label Types

Audio Recording
Computer Programming Coding
Evaluation Rating
Function Calling
RLHF

Freelancer Overview

I specialize in LLM code evaluation and AI training data, with a strong background in assessing code correctness, logical reasoning, edge cases, and adherence to specifications across multiple programming languages. My work focuses on improving model reliability by evaluating generated code, identifying reasoning flaws, validating outputs against expected behavior, and providing high-quality feedback signals for model training and reinforcement learning workflows. I have hands-on experience evaluating backend and AI-driven systems, including LLM-powered applications such as chatbots, recommendation systems, and RAG pipelines. I am proficient in reviewing algorithmic solutions, test coverage, performance considerations, and best practices, ensuring training data reflects real-world engineering standards. My strength lies in producing precise, consistent, and scalable evaluation judgments that directly improve model quality, safety, and usability.

IntermediateIndonesianEnglish

Labeling Experience

Labelbox

Code Evaluation

LabelboxComputer Code ProgrammingComputer Programming Coding
I performed comparative assessments of Response A vs Response B for LLM-generated code solutions. Each comparison evaluated functional correctness, logical reasoning, edge-case handling, performance considerations, and adherence to problem specifications. I delivered structured judgments (better / worse / tie), identified failure modes, and provided concise rationales explaining why one response outperformed the other, generating strong preference signals for model training and ranking.

I performed comparative assessments of Response A vs Response B for LLM-generated code solutions. Each comparison evaluated functional correctness, logical reasoning, edge-case handling, performance considerations, and adherence to problem specifications. I delivered structured judgments (better / worse / tie), identified failure modes, and provided concise rationales explaining why one response outperformed the other, generating strong preference signals for model training and ranking.

2025
Labelbox

Function Call

LabelboxComputer Code ProgrammingFunction Calling
I evaluated model outputs requiring accurate tool selection and schema-compliant argument construction. This included validating function choice, parameter accuracy, data types, and execution readiness for API-style workflows, ensuring reliable model behavior in agent and automation scenarios.

I evaluated model outputs requiring accurate tool selection and schema-compliant argument construction. This included validating function choice, parameter accuracy, data types, and execution readiness for API-style workflows, ensuring reliable model behavior in agent and automation scenarios.

2025 - 2025
Mindrift

Code Evaluation

MindriftComputer Code ProgrammingComputer Programming Coding
I performed comparative assessments of Response A vs Response B for LLM-generated code solutions. Each comparison evaluated functional correctness, logical reasoning, edge-case handling, performance considerations, and adherence to problem specifications. I delivered structured judgments (better / worse / tie), identified failure modes, and provided concise rationales explaining why one response outperformed the other, generating strong preference signals for model training and ranking.

I performed comparative assessments of Response A vs Response B for LLM-generated code solutions. Each comparison evaluated functional correctness, logical reasoning, edge-case handling, performance considerations, and adherence to problem specifications. I delivered structured judgments (better / worse / tie), identified failure modes, and provided concise rationales explaining why one response outperformed the other, generating strong preference signals for model training and ranking.

2025 - 2025
Labelbox

VAD Project

LabelboxAudioTranscription
I transcribed and validated conversational and instructional audio with strict adherence to formatting, timestamps, speaker attribution, and noise-handling guidelines. Emphasis was placed on accuracy, consistency, and alignment with annotation standards to support speech-to-text model training and evaluation.

I transcribed and validated conversational and instructional audio with strict adherence to formatting, timestamps, speaker attribution, and noise-handling guidelines. Emphasis was placed on accuracy, consistency, and alignment with annotation standards to support speech-to-text model training and evaluation.

2025 - 2025

Education

U

Universitas Pelita Harapan

Bachelor of Computer Science, Information Systems

Bachelor of Computer Science
2017 - 2021

Work History

G

Goto Group

Software Engineer

Jakarta
2025 - Present
B

ByteDance

Software Engineer

N/A
2022 - 2025