LLM Evaluation and Prompt Response Rater
I evaluate AI-generated text responses in both Indonesian and English by comparing outputs based on clarity, helpfulness, accuracy, and safety. My work ensures that large language models (LLMs) provide responses that are informative, non-misleading, and free from harmful or inappropriate content.