Labeling Overview: We need an ML engineer to build a supervised fine-tuning (SFT) dataset for a JUCE/C++ audio DSP coding model. The work involves: (1) extracting DSP-relevant C++ functions from 40+ open-source GitHub repos (Surge, ChowDSP, Airwindows, Vital, JUCE framework, etc.), (2) generating high-quality instruction-response pairs using LLM-assisted pipelines (Bespoke Curator or Distilabel with Claude/GPT-4), (3) converting blog posts, tutorials, forum Q&A, and free textbook content into clean ChatML-formatted training examples, (4) quality filtering with a second LLM pass and deduplication, and (5) running QLoRA fine-tuning on Qwen3-Coder using Unsloth. Target: 3,000–5,000 examples covering processBlock, AudioBuffer, juce_dsp filters, oscillators, delay lines, reverb, virtual analog modeling, plugin architecture, and real-time DSP best practices. A comprehensive resource document with all repo URLs, blog links, textbook references, and a clone script will be provided to the hired candidate.
Total Budget
$1,000
Pay per Label
-
Time Requirement
20+ hrs/week
Duration
1-3 months
3,000–5,000 instruction-response pairs for fine-tuning a code LLM on JUCE C++ audio plugin development and DSP (digital signal processing). Sourced from 40+ open-source GitHub repos, free DSP textbooks, technical blogs, official JUCE tutorials/API docs, and forum Q&A. Includes function-level code extraction, natural-language explanations, and synthetic educational examples covering topics like processBlock, AudioBuffer, filters, oscillators, reverb, virtual analog modeling, and plugin architecture.
Software
Hiring Type
Required Location
Workload / Schedule
This is a 2–4 week core project with flexibility for iteration. Phase 1 (dataset curation) is the most time-intensive at roughly 40–60 hours, followed by Phase 2 (QLoRA training and hyperparameter tuning) at 15–25 hours, and Phase 3 (evaluation and documentation) at 10–15 hours. Work is async — no fixed schedule required. Milestones and deliverables will be reviewed as each phase completes. Cloud GPU costs can be discussed and reimbursed as needed.
Software
Data Type
Label Types
Subject Matter / Industry
Language
Job Type
Share link