LLM Output Evaluator
As an LLM Evaluator, I assessed language model outputs for accuracy, safety, and alignment with guidelines. This involved rating and evaluating the responses of generative AI systems using structured rubrics. The work was critical in ensuring the responsible deployment of AI models. • Performed evaluation of LLM outputs for factual correctness. • Used internal tools and guidelines for structured assessment. • Provided feedback to improve model safety and quality. • Specialized in multilingual (Swahili and English) evaluation tasks.