Safety Ai data trainer
This was a large-scale project focused on improving the safety, quality, and accuracy of leading large-language models (LLMs) like GPT-3.5/4 and Google Gemini. My specific data labeling tasks included: 1) Red-Teaming: Creating adversarial prompts to identify and classify safety policy violations. 2) Prompt Refinement (SFT): Writing and rewriting prompts to enhance the model's instruction-following capabilities and response quality. 3) Response Evaluation: Scoring model outputs against detailed rubrics for coherence, factual accuracy, and helpfulness. Quality measures were strictly adhered to through a multi-step review process and maintaining a high inter-annotator agreement (IAA) with the rest of the team.