LLM Conversational AI Evaluation & Response Assessment
Conducted Japanese-language conversational AI evaluation tasks with persona-based LLM chatbots. For each prompt, two model responses were generated, and I engaged in 12-turn conversations with the preferred response. Each turn was evaluated, selecting the better response and providing a short reason in English. Errors during conversation were submitted and reported. Tasks required detailed reasoning, accuracy assessment, and consistency across multiple conversation turns. All work was performed using the Outlier Platform.