AI Annotator
Conducted high-precision safety annotation for Project Azalea (Centific), evaluating real-world conversation transcripts between users and AI assistants. Systematically assessed AI responses for the presence of specific harmful behaviors, including Delusion-Sycophancy, Political Bias, Self-Preservation, Self-Preferential Bias, and Instructed Long-Horizon Sabotage. Delivered accurate Yes/No/Partially judgments with detailed, evidence-based reasoning while incorporating relevant cultural context for Ugandan users. Work directly contributes to improving AI safety, reducing risky behaviors, and training more reliable large language models.