DocsAPI ReferenceStart a crawl
POST/v1/crawl

Start a crawl

Recursively crawls a website starting from the given URL. Runs asynchronously via a BullMQ job queue. Each page is scraped with Playwright and stored to PostgreSQL. Returns a jobId to poll for progress.

Request body

NameTypeRequiredDescription
urlstringrequiredStarting URL for the crawl
maxPagesnumberoptionaldefault: 50Stop after this many pages
maxDepthnumberoptionaldefault: 3Maximum link depth from the start URL
formatsstring[]optionaldefault: ["markdown"]Content formats per page: "markdown", "html", "text"
excludePatternsstring[]optionalURL substrings/patterns to skip (max 50 entries, 500 chars each)
waitForSelectorstringoptionalCSS selector to wait for on each page before extracting
webhookUrlstringoptionalURL to POST the completed results to

Example

bash
curl -X POST http://localhost:3000/v1/crawl \
  -H "Content-Type: application/json" \
  -d '{"url":"https://docs.example.com","maxPages":100}'

Response

bash
{
  "success": true,
  "data": { "jobId": "c3d4e5f6-..." }
}

Try it

Live preview only works when the API is running on your machine.

Run docker compose up and the Try-It panel will become interactive. Until then, copy the curl example below.