Technical

How to Automate Invoice Data Extraction with Make, OpenAI, and Airtable

Stop paying humans to read PDFs. Here is the exact, step-by-step pipeline to extract line-item data from invoices using OpenAI and Make, and log it directly into Airtable.

KytoAI & Automation Firm
·
March 17, 2026
·
3 min read

Key Takeaways

  • 1Configure a Make.com webhook to watch for incoming emails with attachments.
  • 2Convert PDFs to images dynamically so OpenAI Vision can process them.
  • 3Use OpenAI Structured Outputs to force the model to return predictable JSON.
  • 4Map nested JSON data directly into Airtable fields automatically.
  • 5Build a fallback error route to alert humans when an extraction fails.

Typing data from PDFs into a database is a colossal waste of human intelligence. A standard operations team easily loses 15 hours a week just keying invoice amounts into a ledger.

Old-school OCR tools are dead. They break the second a vendor moves a logo or changes a font. Models like GPT-4o read messy documents flawlessly—but only if you force them to output structured data.

By the end of this guide, you will build an automated pipeline that catches emailed invoices, extracts the exact vendor and pricing data, and logs it into Airtable without a single manual keystroke.

[@portabletext/react] Unknown block type "mermaidDiagram", specify a component for it in the `components.types` prop

Phase 1: Configure the Database

Your database must mirror the exact JSON schema you request from OpenAI. If the names mismatch, the automation breaks at the final step.

  1. Step 1. Create the Airtable Base: Build a new table named "Invoices". Add three exact fields: "Invoice ID" (Single line text), "Vendor Name" (Single line text), and "Total Amount" (Currency).

Phase 2: Build the Trigger and File Conversion

We start the pipeline in Make.com. The goal is to filter out the noise, isolate the invoice emails, and convert the PDFs into a format the AI can actually read.

  1. Step 2. Set up the Gmail Watcher: Add the Gmail "Watch Emails" module. Select the folder "INBOX". In the "Filter" field, enter exactly: `has:attachment filename:pdf`. Set "Mark as read" to Yes.
  2. Step 3. Iterate Attachments: Connect the Gmail "Iterate Attachments" module to the watcher. Map the "Attachments[]" array from Step 2 into this module. This isolates the PDF file data.

Raw PDFs break the Vision API

OpenAI’s standard gpt-4o endpoint accepts images, not PDFs. Send a raw PDF buffer, and you get a 400 Bad Request error. Convert it first.

  1. Step 4. Convert PDF to PNG: Add a CloudConvert "Convert a File" module. Map the file buffer from the Iterator. Set the Input Format to `pdf` and the Output Format to `png`.

Phase 3: The OpenAI Structured Extraction

This is the brain of the operation. Do not use pre-built Make modules for OpenAI. We call the API directly to use Structured Outputs, which guarantees a flawless JSON response.

  1. Step 5. Configure the HTTP Request: Add the Make "HTTP - Make a request" module. Set the URL to `https://api.openai.com/v1/chat/completions`. Set Method to `POST`. Add an Authorization header with `Bearer YOUR_API_KEY`.
  2. Step 6. Build the Request Body: Change the Body type to "Raw" and Content type to "JSON (application/json)". Paste the configuration below, mapping the base64 image data from CloudConvert into the image_url.
openai-payload.json
{
  "model": "gpt-4o-2024-08-06",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Extract the invoice data."},
        {
          "type": "image_url",
          "image_url": {"url": "data:image/png;base64,{{1.data}}"}
        }
      ]
    }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "invoice_extraction",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "vendor_name": {"type": "string"},
          "invoice_id": {"type": "string"},
          "total_amount": {"type": "number"}
        },
        "additionalProperties": false,
        "required": ["vendor_name", "invoice_id", "total_amount"]
      }
    }
  }
}

Strict Schema Requirement

Setting `"strict": true` is your insurance policy. It forces the model to conform perfectly to your requested properties. No missing keys. No broken downstream mapping.

Phase 4: Parse and Sync

The API returns a stringified JSON object buried inside the choices array. You have to parse it back into variables Airtable can actually read.

  1. Step 7. Parse the Output: Add a "Parse JSON" module. Map the `choices[1].message.content` string from the HTTP module into the JSON string field.
  2. Step 8. Create the Airtable Record: Add the Airtable "Create a Record" module. Select your "Invoices" base and table. Map the parsed `vendor_name`, `invoice_id`, and `total_amount` variables directly into the matching database fields.
  3. Step 9. Configure Error Handling: Machines fail. Right-click the Airtable module and select "Add error route". Connect a Slack "Create a Message" module to your operations channel, and map the Gmail message link for manual review.

Processing over 500 invoices a month?

Stop wrestling with API documentation. We architect and deploy custom AI workflows tailored to your specific operations in weeks, not months.

Book a discovery call

Frequently Asked Questions

Does OpenAI support raw PDF files directly for vision tasks?

No. The standard Chat Completions API requires image inputs (like PNG or JPEG) for vision. You must convert PDFs to images in your pipeline before passing them to the model.

What happens if OpenAI hallucinates a field name?

We prevent this by using OpenAI's Structured Outputs feature. Setting "strict": true in the API call forces the model to strictly adhere to the exact schema you define.

AI AutomationMake.comOpenAI APIAirtableOps Workflows
Share this article

Kyto

AI & Automation Firm

We design and build AI automations and business operating systems. Agency results + Academy sovereignty.

Ready to automate?

Let's Build Your Operating System.

Book a free discovery call to see how AI automation can transform your operations.

Book Discovery Call