Crawl a docs site
Index an entire documentation site and export every page as Markdown — ready for offline search or LLM ingestion.
Overview
Three steps: start a recursive crawl, poll until it completes, then export all pages as a ZIP of Markdown files.
Start the crawl
bash
curl -X POST http://localhost:3000/v1/crawl \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.example.com",
"maxPages": 500,
"maxDepth": 6,
"formats": ["markdown"]
}'Save the jobId from the response.
Poll for completion
bash
curl http://localhost:3000/v1/crawl/YOUR_JOB_ID
The status field progresses pending → running → completed. The pages array grows in real time — you can read partial results before the crawl finishes.
Tip: Add "webhookUrl" to the crawl request to receive a POST on completion instead of polling.
Export results
bash
curl "http://localhost:3000/v1/crawl/YOUR_JOB_ID/export?format=zip" \ -o docs-site.zip
The ZIP contains one .md file per crawled page named by URL slug. Ready to load into a vector database, a local RAG pipeline, or an LLM context window.