DocsIntroduction

Introduction

Netleaf is a free, open-source web data platform. One command gets you a fully functional scraping and extraction API — running on your hardware, with no rate limits and no cloud bill.

What is Netleaf?

Netleaf exposes ten REST endpoints that cover the full web data pipeline: scraping individual pages, recursive crawling, URL discovery, structured AI extraction, web search, cron scheduling, change detection, and multi-format export. Everything runs in a single Docker Compose stack.

Think of it as a self-hosted alternative to Firecrawl or Apify — with multi-LLM support (Claude, GPT-4o-mini, and fully offline Ollama), built-in scheduling, and cryptographic change detection.

Why self-host?

  • No rate limitsYou control the hardware. Crawl 10 pages or 100,000.
  • Zero costNo subscription, no credit card, no usage fees. Ever.
  • Data stays localPages never leave your server unless you explicitly export them.
  • Offline LLM extractionOllama lets you extract structured data with zero API calls.
  • MIT licensedRead the source, fork it, or extend it — no restrictions.

Endpoints

All endpoints live under base URL http://localhost:3000 (or your server address).

MethodPathDescription
POST/v1/scrapeScrape a page
POST/v1/crawlStart a crawl
GET/v1/crawl/:idPoll crawl status
POST/v1/crawl/:id/webhookAttach a webhook
POST/v1/mapDiscover URLs
POST/v1/extractStructured extraction
POST/v1/searchWeb search
GET/v1/diffDiff two crawls
POST/v1/scheduleCreate a schedule
GET/v1/crawl/:id/exportExport crawl results
GET/healthHealth probe
POST/v1/keysManage API keys

Next steps