Multilingual LLM Prompt Evaluation & Response Ranking
Evaluated and rated AI-generated responses to prompts in English and Swahili. Tasks included assessing grammar, coherence, factual accuracy, and relevance. Compared multiple outputs for quality ranking, and provided feedback for fine-tuning large language models. Maintained annotation quality above 97% across weekly review audits.