Skip to Content

Web

Web icon
Arcade Optimized

Arcade.dev LLM tools for web scraping related tasks

Author:Arcade
Version:2.0.1
Auth:No authentication required
6tools
6require secrets
PyPI VersionPython VersionsWheel StatusDownloadsLicense

Arcade.dev provides a powerful toolkit designed for web scraping tasks, enabling developers to efficiently manage and extract data from websites. This toolkit leverages the Firecrawl API to provide a range of functionalities for both synchronous and asynchronous crawling.

Capabilities:

  • Initiate and manage web crawls, with options for both synchronous and asynchronous operations.
  • Retrieve crawl data and status updates for ongoing or recently completed tasks.
  • Map entire websites starting from a single URL.
  • Scrape specific URLs and receive data in various formats.

Secrets:

  • API key required for accessing Firecrawl, named FIRECRAWL_API_KEY.

Available tools(6)

6 of 6
Tool nameDescriptionSecrets
Cancel an asynchronous crawl job that is in progress using the Firecrawl API.
1
Crawl a website using Firecrawl. If the crawl is asynchronous, then returns the crawl ID. If the crawl is synchronous, then returns the crawl data.
1
Get the data of a Firecrawl 'crawl' that is either in progress or recently completed.
1
Get the status of a Firecrawl 'crawl' that is either in progress or recently completed.
1
Map a website from a single URL to a map of the entire website.
1
Scrape a URL using Firecrawl and return the data in specified formats.
1

Selected tools

No tools selected.

Click "Show all tools" to add tools.

Requirements

Select tools to see requirements

#

Web.CancelCrawl

Cancel an asynchronous crawl job that is in progress using the Firecrawl API.

Parameters

ParameterTypeReq.Description
crawl_idstringRequiredThe ID of the asynchronous crawl job to cancel

Requirements

Secrets:FIRECRAWL_API_KEY

Output

Type:jsonCancellation status information
#

Web.CrawlWebsite

Crawl a website using Firecrawl. If the crawl is asynchronous, then returns the crawl ID. If the crawl is synchronous, then returns the crawl data.

Parameters

ParameterTypeReq.Description
urlstringRequiredURL to crawl
exclude_pathsarray<string>OptionalURL patterns to exclude from the crawl
include_pathsarray<string>OptionalURL patterns to include in the crawl
max_depthintegerOptionalMaximum depth to crawl relative to the entered URL
ignore_sitemapbooleanOptionalIgnore the website sitemap when crawling
limitintegerOptionalLimit the number of pages to crawl
allow_backward_linksbooleanOptionalEnable navigation to previously linked pages and enable crawling sublinks that are not children of the 'url' input parameter.
allow_external_linksbooleanOptionalAllow following links to external websites
webhookstringOptionalThe URL to send a POST request to when the crawl is started, updated and completed.
async_crawlbooleanOptionalRun the crawl asynchronously

Requirements

Secrets:FIRECRAWL_API_KEY

Output

Type:jsonCrawl status and data
#

Web.GetCrawlData

Get the data of a Firecrawl 'crawl' that is either in progress or recently completed.

Parameters

ParameterTypeReq.Description
crawl_idstringRequiredThe ID of the crawl job

Requirements

Secrets:FIRECRAWL_API_KEY

Output

Type:jsonCrawl data information
#

Web.GetCrawlStatus

Get the status of a Firecrawl 'crawl' that is either in progress or recently completed.

Parameters

ParameterTypeReq.Description
crawl_idstringRequiredThe ID of the crawl job

Requirements

Secrets:FIRECRAWL_API_KEY

Output

Type:jsonCrawl status information
#

Web.MapWebsite

Map a website from a single URL to a map of the entire website.

Parameters

ParameterTypeReq.Description
urlstringRequiredThe base URL to start crawling from
searchstringOptionalSearch query to use for mapping
ignore_sitemapbooleanOptionalIgnore the website sitemap when crawling
include_subdomainsbooleanOptionalInclude subdomains of the website
limitintegerOptionalMaximum number of links to return

Requirements

Secrets:FIRECRAWL_API_KEY

Output

Type:jsonWebsite map data
#

Web.ScrapeUrl

Scrape a URL using Firecrawl and return the data in specified formats.

Parameters

ParameterTypeReq.Description
urlstringRequiredURL to scrape
formatsarray<string>OptionalFormats to retrieve. Defaults to ['markdown'].
markdownhtmlrawHtmllinksscreenshotscreenshot@fullPage
only_main_contentbooleanOptionalOnly return the main content of the page excluding headers, navs, footers, etc.
include_tagsarray<string>OptionalList of tags to include in the output
exclude_tagsarray<string>OptionalList of tags to exclude from the output
wait_forintegerOptionalSpecify a delay in milliseconds before fetching the content, allowing the page sufficient time to load.
timeoutintegerOptionalTimeout in milliseconds for the request

Requirements

Secrets:FIRECRAWL_API_KEY

Output

Type:jsonScraped data in specified formats
Last updated on