Prompt Engineering & LLM Output Evaluation (English & Bahasa Indonesia)
Contributed to a fine-tuning and evaluation project involving a large language model (LLM). My responsibilities included crafting high-quality prompts across various task types (instruction-following, dialogue, summarization, etc.) and evaluating model outputs based on clarity, accuracy, helpfulness, and alignment. I worked on English and Bahasa Indonesia data, focusing on linguistic fluency, cultural appropriateness, and relevance to task instructions. This project required strict adherence to formatting guidelines, detailed documentation, and peer feedback cycles. Average daily output: 50–100 prompt-eval pairs.