Multilingual LLM Output Evaluation and Prompt Writing
Freelance AI training tasks focused on evaluating and improving large language model outputs. I rate model responses for correctness, clarity, tone and safety, compare alternative answers, and sometimes rewrite or extend them to provide higher-quality targets. In multilingual tasks I check translations between Spanish, Galician and English, flag literal or unnatural phrasing and propose more natural alternatives. I follow detailed written guidelines and use structured rating scales, aiming for high inter-annotator agreement and consistent application of the criteria.