LLM Evaluation & Prompt Quality Assessment
Evaluated AI-generated Japanese and English responses for accuracy, naturalness, and cultural appropriateness. Compared model outputs, identified inconsistencies, and rated linguistic quality to guide fine-tuning. Annotated prompts and responses for supervised fine-tuning (SFT) to improve reasoning and fluency in LLM systems.