AI Response Evaluator & Prompt Engineer | Outlier
AI Response Evaluator & Prompt Engineer | Outlier | 2024 – Present Remote | Multiple concurrent projects | Task management: Multimango | Time tracking: Hubstaff • Evaluate pairs of AI-generated responses side-by-side across five structured quality dimensions and write detailed preference justifications identifying specific strengths and weaknesses in each response, referencing concrete examples rather than general observations • Review and correct model-generated code for logical errors, runtime failures, and edge case handling across varied programming languages and difficulty levels, tracing code line by line to confirm correct output • Write high-level computer science prompts across multiple subdomains and difficulty tiers on the Millennium Leaf project, calibrating ambiguity, constraints, domain knowledge requirements, and program scope per task specifications • Evaluate text-to-image model outputs for prompt alignment, visual quality, and comparative ranking on the Aether project, assessing multiple generation metrics per task • Apply specialist knowledge of software engineering, ML/AI systems, and computer vision to assess technically complex responses that require domain expertise beyond general annotation capability Stack: Python, JavaScript, TypeScript, software engineering, ML/AI systems, computer vision, data structures and algorithms, Outlier, Multimango, Hubstaff