Introduction
Netleaf is a free, open-source web data platform. One command gets you a fully functional scraping and extraction API — running on your hardware, with no rate limits and no cloud bill.
What is Netleaf?
Netleaf exposes ten REST endpoints that cover the full web data pipeline: scraping individual pages, recursive crawling, URL discovery, structured AI extraction, web search, cron scheduling, change detection, and multi-format export. Everything runs in a single Docker Compose stack.
Think of it as a self-hosted alternative to Firecrawl or Apify — with multi-LLM support (Claude, GPT-4o-mini, and fully offline Ollama), built-in scheduling, and cryptographic change detection.
Why self-host?
- No rate limits — You control the hardware. Crawl 10 pages or 100,000.
- Zero cost — No subscription, no credit card, no usage fees. Ever.
- Data stays local — Pages never leave your server unless you explicitly export them.
- Offline LLM extraction — Ollama lets you extract structured data with zero API calls.
- MIT licensed — Read the source, fork it, or extend it — no restrictions.
Endpoints
All endpoints live under base URL http://localhost:3000 (or your server address).
| Method | Path | Description |
|---|---|---|
| POST | /v1/scrape | Scrape a page |
| POST | /v1/crawl | Start a crawl |
| GET | /v1/crawl/:id | Poll crawl status |
| POST | /v1/crawl/:id/webhook | Attach a webhook |
| POST | /v1/map | Discover URLs |
| POST | /v1/extract | Structured extraction |
| POST | /v1/search | Web search |
| GET | /v1/diff | Diff two crawls |
| POST | /v1/schedule | Create a schedule |
| GET | /v1/crawl/:id/export | Export crawl results |
| GET | /health | Health probe |
| POST | /v1/keys | Manage API keys |
Next steps
Quick Start
Clone, run, and make your first request in under two minutes.
Authentication
Default is local mode — no auth required. Learn when to add keys.
Scrape a page
Turn any URL into clean Markdown, HTML, or plain text.
Structured extraction
Extract typed JSON from any page using Claude, GPT, or Ollama.