BuildShip Logo
BuildShip Community

AI Web Scraper (Anthropic)

Extract structured data from any website by specifying the fields you want. Enter a website URL and a comma-separated list of field names; the tool scrapes the page and returns the extracted information as a structured array of objects.

721

Report this template

Select the reason for reporting

Describe the issue in detail

Share template

Link to template

https://templates.buildship.com/template/hdjnohLT4j1R/

Select an example

Inputs

Website URL

https://www.nba.com/players

Fields

Name,Team,Country
This is a static example using sample inputs. Remix the template to run it with your own values.

Output

Read me

AI-Powered Web Data Extraction Workflow

Extract structured data from any website using AI-powered field recognition that intelligently parses web content into organized, usable information.

Overview

This workflow combines web scraping with Anthropic's Claude AI to automatically extract specific data fields from web pages. Instead of writing complex scraping rules, you simply specify what fields you want (like "Name, Team, Country") and the AI intelligently finds and organizes that information from the webpage content. Perfect for gathering competitive intelligence, research data, or monitoring information across multiple websites without manual data entry.

How it works

  1. Input Collection: You provide a website URL and specify the data fields you want to extract (comma-separated list like "Name,Team,Country")
  2. Web Content Retrieval: The workflow uses BuildShip's Puppeteer service to fetch the webpage content, with options for HTML or text-only extraction
  3. AI-Powered Field Extraction: Claude 3 Haiku processes the web content using a dynamically generated JSON schema based on your specified fields
  4. Structured Data Output: Returns an organized array of entities with your requested field values extracted and formatted

Prerequisites and Integrations Used

  • Anthropic API Key: Get your API key from console.anthropic.com - create an account, navigate to API Keys, and generate a new key
  • BuildShip Puppeteer Service: Uses keyless integration (included with BuildShip credits) for web content extraction

How to customize

Change extraction fields: Modify the fields parameter to specify different data points you want to extract (e.g., "Title,Price,Description,Rating")

Switch extraction modes: Toggle between HTML and text extraction in the Puppeteer node:

  • HTML mode: Preserves page structure and formatting, better for complex layouts
  • Text mode: Returns clean, readable content without markup, ideal for simple text extraction

Target specific page sections: Use CSS selectors to focus on particular page elements:

  • Default: body (entire page)
  • Specific sections: .main-content, #product-list, div.article-content

Adjust AI model settings: Modify the Claude configuration for different performance needs:

  • Increase max tokens for longer content processing
  • Change temperature settings for more creative or precise extraction

Who should use this template

Data Analysts and Researchers gathering competitive intelligence or market research from websites without APIs

Marketing Teams monitoring competitor pricing, product information, or content strategies across multiple sites

E-commerce Businesses tracking supplier pricing, product catalogs, or inventory information

Sales Teams building lead databases by extracting company details and contact information from business directories

Content Creators collecting article metadata, author information, or SEO metrics from competitor websites

Real Estate Professionals gathering property listings, pricing data, and market information for analysis

Next possible steps

  • Add data validation: Include nodes to verify extracted data quality and completeness
  • Set up automated scheduling: Configure triggers to run extraction on a regular schedule
  • Connect to databases: Store extracted data in Airtable, Google Sheets, or other data storage solutions
  • Add notification systems: Send alerts when specific data changes are detected
  • Implement batch processing: Extract data from multiple URLs in a single workflow run
  • Create data transformation: Add nodes to clean, format, or enrich the extracted data before output