DocsGuidesCrawl a docs site

Crawl a docs site

Index an entire documentation site and export every page as Markdown — ready for offline search or LLM ingestion.

Overview

Three steps: start a recursive crawl, poll until it completes, then export all pages as a ZIP of Markdown files.

Start the crawl

bash

curl -X POST http://localhost:3000/v1/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com",
    "maxPages": 500,
    "maxDepth": 6,
    "formats": ["markdown"]
  }'

Save the jobId from the response.

Poll for completion

bash

curl http://localhost:3000/v1/crawl/YOUR_JOB_ID

The status field progresses pending → running → completed. The pages array grows in real time — you can read partial results before the crawl finishes.

Tip: Add "webhookUrl" to the crawl request to receive a POST on completion instead of polling.

Export results

bash

curl "http://localhost:3000/v1/crawl/YOUR_JOB_ID/export?format=zip" \
  -o docs-site.zip

The ZIP contains one .md file per crawled page named by URL slug. Ready to load into a vector database, a local RAG pipeline, or an LLM context window.

Export results Local LLM extraction