AI Data Specialist
The Project involves creating a high-quality, realistic prompt (20,000–48,000 characters) with reusable instructions and domain-specific input data designed to challenge AI models into partial failure. Two AI model responses are generated from your prompt, and you then act as a domain expert to evaluate, rank, and justify which response is better based on objective rules, subjective quality, and scope adherence. The evaluation requires detailed, natural-sounding justifications comparing both responses' strengths and weaknesses, along with selecting a preference strength rating from "much better" to "no preference.