We’re seeking senior‑minded Python engineers who can own the infrastructure underpinning our LLM‑training workflow. You should bring 5 + years of hands‑on Python, deep Linux and Docker fluency, and proven experience designing CI/CD pipelines (GitHub Actions preferred). The role centers on building secure sandboxes, task frameworks, and scoring pipelines for agent evaluation—so familiarity with FastAPI/Flask back‑ends, pytest‑driven testing, and modern dev‑environment tooling (devcontainers, Makefiles) is essential. Clear English communication, a collaborative mindset, and the ability to guide AI researchers through these tools are must‑haves; prior work on platforms like Remotasks, Outlier, or DataAnnotation (coding tasks) is a strong bonus. You’ll deliver reusable repos, automated evaluation pipelines, and developer environments that let experts iterate quickly on agent tasks. Positions are fully remote for talent located in the United States and Canada, with hourly compensation tiered by experience: Junior $34/hr, Middle $37/hr, and Senior $42/hr (USD). Short‑listed candidates will complete a timed HackerRank assessment and a platform coding test before progressing to recruiter interviews.
Total Budget
$7,400
Pay per Label
$37/hr
Time Requirement
20+ hrs/week
Duration
3-6 months
Python LLM‑agent tasks and evaluation outputs.
Software
Hiring Type
Required Location
Workload / Schedule
Flexible schedule, must be able to start immediately
Software
Data Type
Label Types
Subject Matter / Industry
Language
Job Type
Share link