POST/v1/crawl

Start a crawl

Recursively crawls a website starting from the given URL. Runs asynchronously via a BullMQ job queue. Each page is scraped with Playwright and stored to PostgreSQL. Returns a jobId to poll for progress.

Request body

Name	Type	Required	Description
url	string	required	Starting URL for the crawl
maxPages	number	optionaldefault: 50	Stop after this many pages
maxDepth	number	optionaldefault: 3	Maximum link depth from the start URL
formats	string[]	optionaldefault: ["markdown"]	Content formats per page: "markdown", "html", "text"
excludePatterns	string[]	optional	URL substrings/patterns to skip (max 50 entries, 500 chars each)
waitForSelector	string	optional	CSS selector to wait for on each page before extracting
webhookUrl	string	optional	URL to POST the completed results to

Example

bash

curl -X POST http://localhost:3000/v1/crawl \
  -H "Content-Type: application/json" \
  -d '{"url":"https://docs.example.com","maxPages":100}'

Response

bash

{
  "success": true,
  "data": { "jobId": "c3d4e5f6-..." }
}

Try it

Live preview only works when the API is running on your machine.

Run docker compose up and the Try-It panel will become interactive. Until then, copy the curl example below.

Scrape a page Poll crawl status