**Dataset Description (5–8 words):** Structured data from complex websites **Data Type (select one):** Text **Subject Matter/Industry (5–8 words):** Web data extraction and automation **Pre-labeled Data (Yes/No):** No **Labeling Software:** Other **Label Types (select at least 1):** * Data Collection * Computer Programming/Coding * Fine-tuning * Evaluation/Rating **Labeling Overview:** You should have hands-on experience with Python-based web scraping and data extraction from complex sites, including dynamic/JavaScript-rendered pages. You’ll be comfortable troubleshooting scraping failures, validating outputs, and delivering clean structured data. Upper-intermediate English (B2) or higher is required. In this role, you’ll own end-to-end scraping workflows: extracting data across multi-level site structures, using a mix of internal tools (Apify, OpenRouter) and your own scripts/workflows. You’ll validate and normalize data, enforce formatting requirements, and deliver accurate structured datasets (e.g., CSV/JSON/Sheets). You’ll collaborate in a hybrid AI + human setup where AI agents handle repetitive steps and you provide quality control and critical thinking. **Required Locations:** Global - Any Location **Required English Level:** Fluent **Other Qualifications & Requirements (5–10 bullets):** * 1+ year experience in at least one: web scraping, data engineering, software development, automation, or data analysis * Strong Python web scraping skills (e.g., BeautifulSoup + Selenium/Playwright or equivalents) * Proven experience scraping dynamic/JS-heavy sites (infinite scroll, AJAX, JS-rendered content) * Experience extracting from multi-level/hierarchical site structures (e.g., category → entity → details) * Ability to handle changing site structures and implement resilient scraping strategies (selectors, fallbacks, retries) * Ability to clean/normalize/validate scraped data and deliver in structured formats (CSV, JSON, Google Sheets) * Experience with batching/parallelization for scaling large scraping jobs (or equivalent performance approaches) * Familiarity using LLMs/AI tools to accelerate workflows (prompting, automation, extraction assistance) * English level B2+ with ability to follow detailed specs and document edge cases clearly
Total Budget
$10,000
Pay per Label
$20/hr
Time Requirement
20+ hrs/week
Duration
3-6 months
Structured data from complex websites
Software
Hiring Type
Required Location
Workload / Schedule
Flexible, can start immediately
Software
Data Type
Label Types
Subject Matter / Industry
Language
Job Type
Share link