Content Moderation & Fact-Checking for Digital Safety
I have worked on analyzing and categorizing online content including misinformation, disinformation, and hate speech as part of digital safety and fact-checking initiatives. The responsibilities included reviewing large volumes of text-based content, classifying it based on accuracy, tone, and potential harm, and flagging harmful or misleading narratives. I have contributed to structured evaluation of content that was aligned with platform guidelines, supporting safer digital environments. The tasks are closely aligned with AI training workflows such as sentiment classification, safety labeling, and reinforcement learning from human feedback (RLHF), where human judgment is used to improve model outputs. I was also involved in moderating online communities and creating safe spaces, ensuring user-generated content adhered to ethical and community standards.