DocsGuidesRecurring crawls

Recurring crawls

Schedule a crawl to run automatically on any interval using standard cron expressions.

Cron syntax

Standard 5-field format: minute hour day month weekday.

ExpressionMeaning
0 2 * * *Every day at 02:00
0 */6 * * *Every 6 hours
0 9 * * 1Every Monday at 09:00
*/30 * * * *Every 30 minutes

Create a schedule

bash
curl -X POST http://localhost:3000/v1/schedule \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Nightly docs crawl",
    "cronExpression": "0 2 * * *",
    "url": "https://docs.example.com",
    "options": { "maxPages": 200, "formats": ["markdown"] }
  }'

The in-process scheduler polls every 60 s and enqueues a crawl job whenever nextRunAt has passed.

Manage schedules

bash
# List all
curl http://localhost:3000/v1/schedule

# Delete
curl -X DELETE http://localhost:3000/v1/schedule/YOUR_SCHEDULE_ID

Combine with webhooks

Add webhookUrl inside options to receive a POST each time a scheduled crawl completes:

bash
{
  "name":           "Nightly with webhook",
  "cronExpression": "0 2 * * *",
  "url":            "https://docs.example.com",
  "options": {
    "maxPages":   200,
    "webhookUrl": "https://your-app.com/hooks/netleaf"
  }
}

Payloads are signed with X-Netleaf-Signature (HMAC-SHA256) when WEBHOOK_SECRET is set.