Bilingual LLM Evaluation Rater
In multiple AI evaluation projects, I assessed chatbot and AI-generated responses across five key dimensions: correctness/truthfulness, verbosity, instruction following, localization, and harmfulness. Tasks included evaluating single-turn and multi-turn interactions, crafting prompts to trigger specific errors or behaviors, and ensuring cultural and linguistic appropriateness in Indonesian while adhering to strict quality guidelines.