AI Sociolinguist & Evaluation Framework Specialist
I developed and used a five-dimension scoring tool to evaluate AI outputs in multilingual African contexts, creating annotation frameworks and guidelines for edge linguistic cases. My work included designing adversarial (red teaming) scenarios targeting failure modes such as Swahili proverb misinterpretation, Sheng code-switching, and high-context meaning distortion. Additionally, I conducted RLHF preference ranking evaluations, assessing outputs for factual, creative, advisory, and sensitive prompts, and classified AI outputs based on bias, misinformation, and cultural safety. • Authored annotation guidelines and cultural evaluation rubrics for Swahili and East African languages • Applied adversarial testing to surface model weaknesses undetectable by automated tools • Executed RLHF evaluations with detailed sociolinguistic rationales for model training • Translated qualitative insights into quantitative benchmarks for engineering teams.