AI Safety
AI Safety training is based in evaluating model responses and providing feedback when dealing with controversial topics and how to prevent AI from providing information that could be harmful (Hate speech, CP, professional advice, etc.)
Hire this AI Trainer
Sign in or create an account to invite AI Trainers to your job.
I have extensive experience in data labeling and AI training, focusing on delivering high-quality training data across diverse project types, including natural language processing (NLP), safety alignment, reinforcement learning from human feedback (RLHF), sentiment analysis, and image classification. I specialize in prompt engineering, data annotation, and model evaluation, where I have crafted precise datasets to enhance the performance of AI systems. My background in journalism and mathematics has allowed me to work across conversational AI, text summarization, safety-critical tasks, and AI-driven decision-making models, ensuring adherence to ethical guidelines and safety protocols. In RLHF projects, I have played a critical role in providing feedback loops that fine-tune models to align with human values, especially in sensitive areas like content moderation, user safety, and ethical decision-making. My key skills include critical thinking, attention to detail, and adherence to system instructions. Collaborating with AI contributors and reviewers, I have successfully refined AI models for tasks like recommendation systems, customer support bots, and safety alignment models, ensuring they meet the highest standards of safety and reliability. My ability to manage multiple projects efficiently has consistently contributed to the success of AI training efforts.
AI Safety training is based in evaluating model responses and providing feedback when dealing with controversial topics and how to prevent AI from providing information that could be harmful (Hate speech, CP, professional advice, etc.)
The Gray Wolf PIF (Precise Instruction Following) project focuses on refining AI models by teaching them to follow highly specific and complex instructions. The project scope involves the creation of prompts with detailed constraints, including at least five constraints in the first prompt and three in subsequent turns. These constraints dictate not only the content but also the format and specific details of the AI's responses, ensuring a precise outcome. The specific data labeling tasks include writing prompts with multiple constraints, reviewing model responses for deviations (failures in instruction following or truthfulness), rating responses based on defined rubrics, selecting preferred responses, and performing minor or major rewrites where necessary. The project involves large-scale data processing, with each task generating two responses from the AI, both of which are reviewed and rated. Participants assess instruction following, truthfulness, and writing style.
This project's goal was to solve complex mathematics prompts for users and evaluate the model's response.
This project is aimed at providing high quality prompts based off of provided audio clips. The goal of the project is to create unique prompts that create deviations in the model response which then need to be evaluated and edited if needed.
The White Wolf AI Training project has a broad scope aimed at enhancing conversational AI by simulating real-life interactions between users and the AI model. Participants are tasked with generating prompts that are realistic, complex, and multifaceted, designed to challenge the AI’s ability to respond appropriately. The primary data labeling tasks performed include evaluating the AI-generated responses for quality based on various dimensions, such as instruction following, truthfulness, content completeness, and writing style. Participants also detect errors or failures in responses, such as factual inaccuracies or inappropriate tone, and provide feedback to the model. Additionally, they rank responses based on preference, selecting the one that most closely aligns with the prompt's requirements. The final stage involves rewriting responses to correct errors and ensure the AI’s output is high-quality.
Masters of Science, Physics
Bachelor's in Professional Physics, Physics
Senior Content Strategist