Andy Fransisko - LLM Code Quality, Evaluation, and Reasoning Specialist

Key Skills

Software

Labelbox

Mindrift

Toloka

Top Subject Matter

No subject matter listed

Top Data Types

Audio

Computer Code Programming

Top Task Types

Audio Recording

Computer Programming/Coding

Evaluation/Rating

Function Calling

RLHF

Freelancer Overview

I specialize in LLM code evaluation and AI training data, with a strong background in assessing code correctness, logical reasoning, edge cases, and adherence to specifications across multiple programming languages. My work focuses on improving model reliability by evaluating generated code, identifying reasoning flaws, validating outputs against expected behavior, and providing high-quality feedback signals for model training and reinforcement learning workflows. I have hands-on experience evaluating backend and AI-driven systems, including LLM-powered applications such as chatbots, recommendation systems, and RAG pipelines. I am proficient in reviewing algorithmic solutions, test coverage, performance considerations, and best practices, ensuring training data reflects real-world engineering standards. My strength lies in producing precise, consistent, and scalable evaluation judgments that directly improve model quality, safety, and usability.

IntermediateIndonesianEnglish

Labeling Experience

Code Evaluation

LabelboxComputer Code ProgrammingComputer Programming Coding

I performed comparative assessments of Response A vs Response B for LLM-generated code solutions. Each comparison evaluated functional correctness, logical reasoning, edge-case handling, performance considerations, and adherence to problem specifications. I delivered structured judgments (better / worse / tie), identified failure modes, and provided concise rationales explaining why one response outperformed the other, generating strong preference signals for model training and ranking.

2025

Function Call

LabelboxComputer Code ProgrammingFunction Calling

I evaluated model outputs requiring accurate tool selection and schema-compliant argument construction. This included validating function choice, parameter accuracy, data types, and execution readiness for API-style workflows, ensuring reliable model behavior in agent and automation scenarios.

2025 - 2025

Code Evaluation

MindriftComputer Code ProgrammingComputer Programming Coding

I performed comparative assessments of Response A vs Response B for LLM-generated code solutions. Each comparison evaluated functional correctness, logical reasoning, edge-case handling, performance considerations, and adherence to problem specifications. I delivered structured judgments (better / worse / tie), identified failure modes, and provided concise rationales explaining why one response outperformed the other, generating strong preference signals for model training and ranking.

2025 - 2025

VAD Project

LabelboxAudioTranscription

I transcribed and validated conversational and instructional audio with strict adherence to formatting, timestamps, speaker attribution, and noise-handling guidelines. Emphasis was placed on accuracy, consistency, and alignment with annotation standards to support speech-to-text model training and evaluation.

2025 - 2025

Education

U

Universitas Pelita Harapan

Bachelor of Computer Science, Information Systems

Bachelor of Computer Science

2017 - 2021

Work History

G

Goto Group

Software Engineer

Jakarta

2025 - Present

B

ByteDance

Software Engineer

N/A

2022 - 2025