Dwiki Prakasa - Software Engineer - Web Frontend Development and AI Training

Key Skills

Software

Mindrift

Toloka

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code Programming

Image

Text

Top Task Types

Question Answering

RLHF

Evaluation/Rating

Computer Programming/Coding

Prompt + Response Writing (SFT)

Text Generation

Freelancer Overview

I am an experienced software engineer and freelance AI trainer with a strong background in data annotation, coding evaluation, and AI training data creation. My work involves designing and delivering complex programming and STEM tasks, developing coding prompts, and executing detailed side-by-side LLM comparisons across multiple quality dimensions such as truthfulness, instruction following, and emotional intelligence. I have hands-on experience with Python, NumPy, and Pandas for technical validation and logical analysis of AI-generated outputs, ensuring accuracy and consistency in data labeling and response assessment. I am skilled at collaborating in distributed reviewer environments, adhering to strict annotation guidelines, and applying rigorous evaluation criteria to enhance AI model reliability and performance. My expertise spans software development, prompt engineering, and reinforcement learning from human feedback (RLHF), making me passionate about advancing high-quality AI systems through precise data annotation and evaluation.

IntermediateIndonesianEnglish

Labeling Experience

AI Model Evaluation & Task Generation STEM Domains for Computer Science (Python)

MindriftComputer Code ProgrammingQuestion AnsweringRLHF

Acting as a Subject Matter Expert (SME) to train and evaluate Large Language Models (LLMs) in the domain of Computer Science, Mathematics, Physics and Python programming. Key Responsibilities: * Task Generation: Creating computationally intensive STEM problems that require multi-step reasoning and Python coding to solve, designed specifically to challenge and identify reasoning failures in AI models. Model Evaluation (RLHF): Evaluating AI-generated code for correctness, efficiency, and adherence to constraints. Analyzing failure modes such as logic errors, hallucinations, or suboptimal algorithms. Golden Solution Creation: Developing deterministic, reproducible, and efficient Python solutions (using libraries like pandas, numpy, scipy) alongside clear, human-readable explanations to serve as ground truth for model training. Vibe Coding/Rapid Prototyping: executing rapid coding tasks to correct AI responses based on programming standards and clean code principles.

2025

Side-by-Side (SxS) LLM Evaluation & Conversational Analysis (Apricot Project)

MindriftTextRLHFEvaluation Rating

Executing expert-level Side-by-Side (SxS) comparisons of Large Language Model (LLM) responses. The role involves evaluating model performance across 10 specific quality dimensions, including Instruction Following, Truthfulness, and Harmlessness. Specialized in "Conversationality" tasks, assessing models on advanced metrics such as: * Natural Dialogue: Evaluating if the response mirrors human speech patterns and flow. * User Intent & Understanding: Measuring the model's emotional intelligence (EQ) and ability to grasp implicit user needs. * Conversation Continuation: Assessing how effectively the model drives the dialogue forward. Responsibilities include assigning 1-5 ratings for each dimension and writing detailed, evidence-based rationales to justify preference decisions.

2025 - 2025

High-Integrity Prompt Engineering & Model Evaluation (MEI/Toloka)

TolokaComputer Code ProgrammingText GenerationRLHF

Executing high-complexity prompt engineering and model evaluation under the "High-Integrity Standard" framework. The role requires adhering to a strict "3-Gate Quality Framework" to generate domain-specific tasks (specifically in Computer Science/Python) that are rigorous, realistic, and solvable. Key responsibilities include: * Engineered Prompt Creation: Designing prompts that pass strict criteria for "Objective Truth" and "Step Validity" , ensuring tasks utilize verifiable reasoning chains rather than subjective requests. * Multi-Dimensional Evaluation: assessing AI model responses across 5 distinct dimensions: Harmlessness, Correctness, Step Validity, Completeness, and Clarity. * Defensible Scoring: Applying the "Zero-Assumption Rule" to verify every line of code or reasoning step, providing evidence-based justifications for every score.

2025 - 2025

AI Agent Evaluation & Tool Use Benchmarking (TAU Framework)

MindriftTextEvaluation Rating

Evaluating AI agents within the Tool-Agent-User (TAU) framework to benchmark reliability in realistic scenarios. The role focuses on "Trajectory Evaluation," assessing whether agents correctly solve user problems by utilizing specific tools (functions) and strictly adhering to domain policies. Key responsibilities include: * Trajectory Correctness: Analyzing full user-agent conversations to identify "Agent Faults" (policy violations, wrong tool usage) versus "User Faults," ensuring the agent's reasoning process is sound even if the final outcome is numerically correct. * Golden Set Verification: Defining and editing the "Golden Set"—the required sequence of tool calls (Read-only vs. DB-modifying) needed to correctly fulfill a request. * Database Logic: Verifying how agents interact with structured JSON databases and ensuring parameters in tool calls match the database fields.

2025 - 2025

Education

U

University of Lampung

Bachelor of Computer Science, Computer Science

Bachelor of Computer Science

2016 - 2021

Work History

M

Mindrift by Toloka

Freelance AI Trainer / AI Coding Evaluator

Jakarta

2025 - Present

P

Prima Vista Solusi

Software Engineer (Frontend)

South Jakarta

2021 - Present