BuildShip Logo
BuildShip Community

Document_extarctor

This tool is an intelligent document parser that extracts structured data from invoice images using OCR and layout-aware analysis. It automatically identifies key fields such as invoice details, sender and recipient information, itemized charges, totals, and payment terms—without requiring a predefined schema. The output is a clean, well-structured JSON with dates in ISO format and numerical values as numbers. It filters out irrelevant or decorative text, handles missing fields with null, and ensures accuracy by understanding the visual and contextual layout of the invoice.

6

Report this template

Select the reason for reporting

Describe the issue in detail

Share template

Link to template

https://templates.buildship.com/template/ZY8adTiQz-zV/

Inputs

File

File
This is a static example using sample inputs. Remix the template to run it with your own values.

Output

Read me

Invoice Data Extractor

Invoice Data Extractor is an AI-powered tool designed to convert invoice images into structured, machine-readable data. Developed using the BuildShip AI agent platform and powered by GPT-4 Vision, this solution intelligently parses visual content from diverse invoice formats and returns key information as a structured, stringified JSON object. Additionally, it dynamically generates a custom JSON schema tailored to each unique invoice structure.

Tool Name:DocumentExtarctor

Tool Trigger API-a17fbe75e76b54a052e31f3489e3e588cebd286889d47829877c7691e04e7d8b

Flow

Key Features

  • Image-Based Input Supports invoices in image formats (JPG, PNG) or scanned PDFs.
  • Smart Field Extraction Automatically identifies and extracts essential fields, including:
    • Invoice number
    • Dates (invoice, due)
    • Vendor and recipient information
    • Line items and total amount
    • Tax and payment details
  • Dynamic JSON Schema Generation Creates a customized JSON schema based on the structure and content of each invoice, enabling flexibility across various invoice formats and layouts.
  • Structured Output Returns results as well-formatted, stringified JSON — ready for downstream automation and integration using JavaScript
  • No Predefined Templates Required Adapts to diverse invoice styles without needing prior schema definition or template configuration.

Technology Stack

  • Large Language Model (LLM): GPT-4 Vision
  • Agent Platform: BuildShip
  • Input: Invoice image (JPG, PNG, JPEG)
  • Output:
    • Structured stringified JSON
    • Custom JSON schema depend on innvoice

How It Works

  1. Upload User uploads an invoice image through the interface or Tool Trigger API.
  2. Processing GPT-4 Vision interprets the document’s visual layout and content.
  3. Schema Generation & Extraction The system first generates a tailored JSON schema based on detected fields, and then populates it with corresponding values from the document.
  4. Output The final output includes:
    • A stringified JSON object containing the invoice data
    • A JSON schema outlining the structure of the extracted fields

Sample Input

Sample Output

{  \"invoice_details\": {    \"payment_for\": \"RISHIT RASTOGI\",    \"division\": \"ME\",    \"standard_course\": \"B Tech\",    \"registration_code\": \"ME22B2017\",    \"academic_year_start\": \"2023-04\",    \"academic_year_end\": \"2027-03\",    \"fee_description\": \"Jan-May 2025\",    \"payment_date\": \"2024-12-30T10:03\",    \"qfix_reference_number\": \"J9GCUHNK22B2017\",    \"fee_amount\": 105140.00,    \"late_payment_charges\": 0.00,    \"other_charges\": 0.00,    \"discount_amount\": 0.00,    \"remaining_amount\": 0.00,    \"paid_amount\": 105140.00,    \"mode_of_payment\": \"DEBIT CARD\"  },  \"itemized_charges\": [    {      \"description\": \"TUITION FEE\",      \"amount\": 66000.00,      \"paid\": 66000.00    },    {      \"description\": \"SEMESTER FEE\",      \"amount\": 4800.00,      \"paid\": 4800.00    },    {      \"description\": \"HOSTEL FEES\",      \"amount\": 34340.00"
}

Use Cases

  • Automated invoice entry and reconciliation
  • Financial and tax document processing
  • Integration with ERP/CRM systems
  • AI-powered document parsing for enterprise workflows