Juliet Schive - LLM Evaluation and Text Generation Specialists in Mathematics and English

Key Skills

Software

CVAT

Data Annotation Tech

Dataloop

Google Cloud Vertex AI

Labelbox

Mindrift

Remotasks

Scale AI

SuperAnnotate

Top Subject Matter

LLM Prompt Writing & Evaluation in English

LLM Evaluation & Expertise in Mathematics

Physics Subject Matter

AI Safety

LLM Evaluation & Expertise in Mathematics and STEM

Video and Audio LLM

Top Data Types

Image

Text

Video

Top Task Types

Data Collection

Emotion Recognition

Evaluation Rating

Prompt Response Writing SFT

RLHF

Freelancer Overview

I have extensive experience in data labeling and AI training, focusing on delivering high-quality training data across diverse project types, including natural language processing (NLP), safety alignment, reinforcement learning from human feedback (RLHF), sentiment analysis, and image classification. I specialize in prompt engineering, data annotation, and model evaluation, where I have crafted precise datasets to enhance the performance of AI systems. My background in journalism and mathematics has allowed me to work across conversational AI, text summarization, safety-critical tasks, and AI-driven decision-making models, ensuring adherence to ethical guidelines and safety protocols. In RLHF projects, I have played a critical role in providing feedback loops that fine-tune models to align with human values, especially in sensitive areas like content moderation, user safety, and ethical decision-making. My key skills include critical thinking, attention to detail, and adherence to system instructions. Collaborating with AI contributors and reviewers, I have successfully refined AI models for tasks like recommendation systems, customer support bots, and safety alignment models, ensuring they meet the highest standards of safety and reliability. My ability to manage multiple projects efficiently has consistently contributed to the success of AI training efforts.

ExpertGermanEnglishSpanish

Labeling Experience

AI Safety

Don T DiscloseText

AI Safety training is based in evaluating model responses and providing feedback when dealing with controversial topics and how to prevent AI from providing information that could be harmful (Hate speech, CP, professional advice, etc.)

2024

Gray Wolf PIF - RLHF

Scale AIText

The Gray Wolf PIF (Precise Instruction Following) project focuses on refining AI models by teaching them to follow highly specific and complex instructions. The project scope involves the creation of prompts with detailed constraints, including at least five constraints in the first prompt and three in subsequent turns. These constraints dictate not only the content but also the format and specific details of the AI's responses, ensuring a precise outcome. The specific data labeling tasks include writing prompts with multiple constraints, reviewing model responses for deviations (failures in instruction following or truthfulness), rating responses based on defined rubrics, selecting preferred responses, and performing minor or major rewrites where necessary. The project involves large-scale data processing, with each task generating two responses from the AI, both of which are reviewed and rated. Participants assess instruction following, truthfulness, and writing style.

2024

Bee SFT

Scale AIText

This project's goal was to solve complex mathematics prompts for users and evaluate the model's response.

2024

Dolphin ATT

Scale AIAudio

This project is aimed at providing high quality prompts based off of provided audio clips. The goal of the project is to create unique prompts that create deviations in the model response which then need to be evaluated and edited if needed.

2024

White Wolf - RLHF

Don T DiscloseText

The White Wolf AI Training project has a broad scope aimed at enhancing conversational AI by simulating real-life interactions between users and the AI model. Participants are tasked with generating prompts that are realistic, complex, and multifaceted, designed to challenge the AI’s ability to respond appropriately. The primary data labeling tasks performed include evaluating the AI-generated responses for quality based on various dimensions, such as instruction following, truthfulness, content completeness, and writing style. Participants also detect errors or failures in responses, such as factual inaccuracies or inappropriate tone, and provide feedback to the model. Additionally, they rank responses based on preference, selecting the one that most closely aligns with the prompt's requirements. The final stage involves rewriting responses to correct errors and ensure the AI’s output is high-quality.

2023 - 2024

Education

R

Rutgers University

Masters of Science, Physics

Masters of Science

2021 - 2022

R

Rutgers University

Bachelor's in Professional Physics, Physics

Bachelor's in Professional Physics

2018 - 2021

Work History

S

Socrates Global

Senior Content Strategist

Remote

2023 - Present