Extract

curl --request POST \
  --url https://api.crawlkit.sh/v1/crawl/extract \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "<string>",
  "schema": {},
  "options": {
    "timeout": 300500,
    "headers": {},
    "waitFor": "<string>",
    "actions": [
      {
        "type": "wait",
        "milliseconds": 15050
      }
    ],
    "contentSelector": "<string>",
    "prompt": "<string>",
    "maxDepth": 0,
    "depthPrompt": "<string>",
    "excludePaths": [
      "<string>"
    ],
    "includeOnlyPaths": [
      "<string>"
    ],
    "ignoreSitemap": false,
    "stealth": false
  }
}
'

{
  "success": false,
  "error": {
    "code": "<string>",
    "message": "<string>"
  }
}

Crawl

Extract

Scrapes the given URL and extracts structured data using Gemini 2.5 Flash Lite.

Response includes:

markdown: Cleaned, readable markdown content
html: Cleaned HTML (main content only)
rawHtml: Original unprocessed HTML
metadata: Page metadata (title, description, og:*, author, dates, etc.)
links: Internal and external links found on the page
json: LLM-extracted structured data based on provided schema
stats: Processing time statistics
pages: Array of crawled pages (if maxDepth > 0)

Required:

schema: JSON Schema defining the structure for extraction

Options:

prompt: Custom extraction prompt (max 2000 chars)
onlyMainContent: Extract only main content (default: true)
contentSelector: Custom CSS selector for content extraction
maxDepth: Number of additional pages to crawl (0-10, +2 credits each)
depthPrompt: LLM prompt for selecting relevant links
excludePaths: URL paths to exclude (e.g., [“/login”, “/admin”])
includeOnlyPaths: Only crawl matching paths (e.g., [“/blog”])
ignoreSitemap: Skip sitemap URL discovery (default: false)

Browser Rendering: Use waitFor (selector or ms) or actions to force browser rendering.

Credits: 5 + (maxDepth * 2) credits per request.

POST

crawl

extract

Extract

curl --request POST \
  --url https://api.crawlkit.sh/v1/crawl/extract \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "<string>",
  "schema": {},
  "options": {
    "timeout": 300500,
    "headers": {},
    "waitFor": "<string>",
    "actions": [
      {
        "type": "wait",
        "milliseconds": 15050
      }
    ],
    "contentSelector": "<string>",
    "prompt": "<string>",
    "maxDepth": 0,
    "depthPrompt": "<string>",
    "excludePaths": [
      "<string>"
    ],
    "includeOnlyPaths": [
      "<string>"
    ],
    "ignoreSitemap": false,
    "stealth": false
  }
}
'

{
  "success": false,
  "error": {
    "code": "<string>",
    "message": "<string>"
  }
}

Authorizations

Authorization

string

header

required

API key with 'ApiKey ' prefix (e.g., ApiKey ck_...)

Body

application/json

url

string<uri>

required

URL to scrape and extract data from

schema

object

required

JSON Schema defining the structure for LLM extraction. Required.

Show child attributes

options

object

Show child attributes

Response

Default Response

success

enum<boolean>

required

Available options:

false

error

object

required

Show child attributes

Scrape Web Search

Crawl

LinkedIn

Instagram

PlayStore

AppStore

TikTok

Extract

Authorizations

Body

Response