Queue Master (QM)
This project involved the systematic evaluation of large language models (LLMs) using a prompt-response framework. Each test case presented a single prompt and two candidate responses, which were assessed across multiple dimensions including localization, instruction following, truthfulness, coherence, and overall quality.