← Home Claude / How to Build a Receipt Parser with…
10 min
Claude

How to Build a Receipt Parser with Claude: A Tax Audit Assistant That Actually Works

promptyze
Editor · Promptowy
07.03.2026 Date
10 min Reading time
How to Build a Receipt Parser with Claude: A Tax Audit Assistant That Actually Works
Receipt data flowing through AI pipeline. promptowy.com

Expense reconciliation is the kind of work that makes accountants stare into the middle distance. A stack of 400 receipts from a single client’s Q4 travel budget, all in different formats, some photographed sideways, some faded to near-invisibility — and someone has to categorize every single one before the audit. That someone is usually the most expensive person in the room.

Claude’s 200,000-token context window and native image processing make a meaningful dent in this problem. You can feed it batches of receipt images, get structured expense data back, and flag anomalies before a human ever opens the spreadsheet. This tutorial walks through building exactly that pipeline using the Claude API and Claude Code — from a single test receipt to a working audit assistant. No unverified benchmarks, no magic numbers. Just the actual build.

What You’ll Actually Build

By the end of this guide, you’ll have a Python-based pipeline that accepts receipt images or PDFs, sends them to Claude via the API, extracts structured expense data (vendor, amount, date, category, VAT/tax details), flags anything unusual, and outputs a clean CSV or JSON file ready for your accounting software. You’ll also get a prompt-driven audit summary that a real accountant can review in minutes instead of hours. The architecture is straightforward enough to adapt for any document volume your API tier supports.

Requirements

Before writing a single line of code, make sure you have the following in place. You’ll need an Anthropic API key with access to Claude Sonnet 4.6 — it’s the sweet spot between capability and cost for high-volume document tasks. Claude Opus 4.6 will give you sharper reasoning on ambiguous receipts but at significantly higher per-token cost, so save it for the anomaly-review stage. You’ll need Python 3.10 or later, the anthropic Python SDK, and the Pillow library for image preprocessing. For PDF handling, pdf2image and poppler are your friends. A basic understanding of the Claude API’s messages format helps, but this tutorial covers the key patterns from scratch.

Note 💡

Claude Sonnet 4.6 is the right default for this pipeline. Opus 4.6 is worth switching to only when you need Claude to reason through genuinely ambiguous documents — think hand-written receipts, partially torn invoices, or receipts in languages outside your primary locale.

Step 1: Set Up Your Environment

Install your dependencies first. In a clean virtual environment, run pip install anthropic pillow pdf2image python-dotenv. Store your API key in a .env file rather than hardcoding it — a habit that pays off when you hand this codebase to a colleague or push it to a client’s infrastructure.

Your project structure should look like this: a receipts/ folder for input images, an output/ folder for parsed results, a prompts/ folder where you store your system and user prompt templates, and a parser.py as your main script. Keeping prompts in separate files makes iteration much faster — you’ll be tuning them more than the code.

Step 2: Build the Receipt Extraction Prompt

The prompt is where most of the work lives. Claude needs clear instructions about what to extract, how to handle ambiguity, and what format to return. Here’s the core extraction prompt to drop into prompts/extract.txt:

You are a professional expense auditor extracting structured data from receipt images.

For each receipt, extract the following fields and return ONLY valid JSON:

{
  "vendor_name": "string or null",
  "transaction_date": "YYYY-MM-DD or null",
  "total_amount": number or null,
  "currency": "3-letter ISO code or null",
  "tax_amount": number or null,
  "tax_rate_percent": number or null,
  "expense_category": "one of: Travel, Meals, Accommodation, Office Supplies, Software, Equipment, Professional Services, Utilities, Other",
  "payment_method": "string or null",
  "line_items": [{"description": "string", "amount": number}],
  "confidence_score": 0.0-1.0,
  "extraction_notes": "any issues, ambiguities, or missing data"
}

Rules:
- If a field is not visible or legible, return null — do not guess.
- For expense_category, choose the closest match based on vendor type and line items.
- confidence_score reflects overall extraction quality: 1.0 = all fields clear, 0.5 = significant missing/ambiguous data.
- extraction_notes must mention any partial text, poor image quality, or unusual line items.
- Return ONLY the JSON object. No preamble, no explanation.

The confidence_score field does a lot of heavy lifting here. It gives your pipeline a machine-readable signal to route low-confidence receipts straight to a human review queue without you having to write complex heuristics. Anything below 0.7 probably needs eyes on it.

Pro tip ✅

Define your expense categories upfront and lock them in the prompt. If you let Claude invent category names, you’ll end up with “Restaurant” in one record and “Business Dining” in another. Controlled vocabularies are your friend in any data pipeline.

Step 3: Write the API Call with Image Input

Claude’s API accepts images as base64-encoded strings in the content array of a message. Here’s the pattern for a single receipt:

import anthropic
import base64
from pathlib import Path

def encode_image(image_path: str) -> str:
    with open(image_path, "rb") as f:
        return base64.standard_b64encode(f.read()).decode("utf-8")

def parse_receipt(image_path: str, system_prompt: str) -> dict:
    client = anthropic.Anthropic()
    image_data = encode_image(image_path)
    
    # Detect media type from extension
    ext = Path(image_path).suffix.lower()
    media_type_map = {".jpg": "image/jpeg", ".jpeg": "image/jpeg",
                      ".png": "image/png", ".webp": "image/webp"}
    media_type = media_type_map.get(ext, "image/jpeg")
    
    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=system_prompt,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": media_type,
                            "data": image_data,
                        },
                    },
                    {
                        "type": "text",
                        "text": "Extract all expense data from this receipt."
                    }
                ],
            }
        ],
    )
    
    import json
    return json.loads(message.content[0].text)

This is your atomic unit. Everything else in the pipeline wraps around this function — batching, error handling, retry logic, output formatting. Keep the core clean and build outward from here.

Warning ⚠️

PDF receipts need to be converted to images before you can pass them to Claude’s vision API. Use pdf2image.convert_from_path() to turn each PDF page into a PNG. For multi-page PDF invoices, parse each page separately and merge the results — don’t try to stitch pages into a single giant image.

Step 4: Process Receipts in Batches

For a folder of receipts, you want a loop that handles errors gracefully and doesn’t die on a single bad image file. Here’s a minimal batch processor:

import os
import json
import time
from pathlib import Path

def process_receipt_folder(folder_path: str, system_prompt: str) -> list:
    results = []
    image_extensions = {".jpg", ".jpeg", ".png", ".webp"}
    receipt_files = [
        f for f in Path(folder_path).iterdir()
        if f.suffix.lower() in image_extensions
    ]
    
    print(f"Processing {len(receipt_files)} receipts...")
    
    for i, receipt_path in enumerate(receipt_files):
        try:
            parsed = parse_receipt(str(receipt_path), system_prompt)
            parsed["source_file"] = receipt_path.name
            results.append(parsed)
            print(f"[{i+1}/{len(receipt_files)}] {receipt_path.name} — confidence: {parsed.get('confidence_score', 'N/A')}")
        except json.JSONDecodeError:
            print(f"[ERROR] JSON parse failed for {receipt_path.name}")
            results.append({"source_file": receipt_path.name, "error": "json_parse_failed"})
        except Exception as e:
            print(f"[ERROR] {receipt_path.name}: {str(e)}")
            results.append({"source_file": receipt_path.name, "error": str(e)})
        
        # Respect API rate limits — adjust based on your tier
        time.sleep(0.5)
    
    return results

The time.sleep(0.5) is there to stay under API rate limits. Your actual throughput depends entirely on which Anthropic API tier you’re on. Check your rate limits in the Anthropic console before running this against 500 receipts and wondering why it stopped at receipt 47.

Pro tip ✅

Run your first batch test on 10-20 receipts that represent the full range of quality you expect — crisp digital receipts, blurry phone photos, old faded paper. This tells you your real-world accuracy before you commit to the pipeline on actual client data.

Step 5: Flag Anomalies

Raw extracted data is useful. Data with anomaly flags is what accountants actually want. Pass your batch results back to Claude with a second prompt designed specifically to spot problems:

You are a tax audit specialist reviewing parsed expense data for anomalies.

Given the following JSON array of expense records, identify any items that warrant human review.

Flag an expense if ANY of the following apply:
- Amount exceeds $500 for Meals category
- Duplicate vendor + date + amount combinations
- Missing date or vendor name (null values)
- confidence_score below 0.70
- Round-number amounts over $200 (potential estimate, not actual receipt)
- Expense category is "Other" with no explanation in extraction_notes
- Transaction date is a weekend for categories typically business-only (Travel, Professional Services)

Return a JSON array of flagged items with this structure:
{
  "source_file": "original filename",
  "flag_reasons": ["list of specific reasons"],
  "priority": "high | medium | low",
  "recommended_action": "brief instruction for the reviewer"
}

Only include items that have at least one flag. Return an empty array [] if nothing is flagged.

The weekend transaction check is worth noting — it’s a simple heuristic that catches a surprising number of edge cases without requiring any complex business logic on your end. You can extend this list with any firm-specific rules.

Step 6: Generate the Audit Report

Once you have clean data and a flagged list, generating a human-readable summary is a single additional Claude call:

You are an experienced accountant preparing a concise audit summary for a senior partner.

Given the expense data and flagged items below, write a professional summary that covers:

1. OVERVIEW: Total expenses by category (table format), date range covered, total receipt count
2. DATA QUALITY: How many receipts had confidence scores below 0.70, common extraction issues
3. FLAGGED ITEMS: Summary of anomalies, grouped by flag type, with counts
4. RECOMMENDATIONS: Top 3 action items for the reviewing accountant, specific and prioritized

Tone: direct, professional, no filler. This summary goes into a client audit file.
Length: 300-400 words maximum.
Format: Use markdown headers and a table for the category breakdown.

Expense data:
{expense_json}

Flagged items:
{flags_json}

That report prompt deliberately caps the output at 400 words. Longer doesn’t mean better here — an audit summary that takes 10 minutes to read defeats the purpose.

Pro tip ✅

Save every prompt template in version control alongside your code. When you tune a prompt and the output quality shifts, you want to be able to diff exactly what changed. Prompts are logic, not config — treat them accordingly.

Step 7: Export to CSV and Connect to Your Workflow

Most accounting workflows end in a spreadsheet or accounting software import. Here’s a quick CSV export to wrap the pipeline:

import csv

def export_to_csv(results: list, output_path: str):
    if not results:
        return
    
    fieldnames = [
        "source_file", "vendor_name", "transaction_date", "total_amount",
        "currency", "tax_amount", "tax_rate_percent", "expense_category",
        "payment_method", "confidence_score", "extraction_notes", "error"
    ]
    
    with open(output_path, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames, extrasaction="ignore")
        writer.writeheader()
        writer.writerows(results)
    
    print(f"Exported {len(results)} records to {output_path}")

The extrasaction="ignore" parameter quietly drops any extra fields (like line_items) that don’t map to your flat CSV columns. If you need line items preserved, export to JSON instead and handle the import on the accounting software side.

Warning ⚠️

This pipeline produces structured data, not verified accounting records. Before feeding its output into any official audit or tax filing, a qualified accountant needs to review flagged items and spot-check the rest. The tool reduces the time spent on mechanical extraction — it doesn’t replace professional judgment.

Calibrating Your Expectations

The unverified claims floating around about processing 1,000 receipts in 3 minutes and saving 20+ hours per client per year make good headlines, but they depend heavily on variables this tutorial can’t control: your API tier and its rate limits, the image quality of your receipt stock, how well your expense categories match reality, and whether your clients hand you JPEGs or mystery PDFs photographed under fluorescent lighting at 11 PM.

What’s verified: Claude’s 200K context window handles large document batches. The API processes images. Structured JSON output from the prompts above is consistent and reliable on clean inputs. Where things get interesting — and slower — is on poor-quality scans, handwritten receipts, and documents in non-Latin scripts. Test on a representative sample of your actual document quality before building any time-savings projection into a client proposal.

Avoid 🚫

Don’t send personally identifiable financial data to the API without confirming your data handling agreements with Anthropic meet your jurisdiction’s requirements and your client’s contractual expectations. This is not optional if you’re in a regulated industry.

Where to Take This Next

The pipeline above is a working foundation, not a finished product. The natural next step is a simple web interface — a Flask or FastAPI endpoint that accepts uploaded receipt files and returns structured data, so non-technical staff can run it without touching Python. After that, connecting the output directly to an accounting platform’s API (most major platforms publish REST APIs for transaction import) closes the loop entirely.

The more interesting evolution is fine-tuning the anomaly detection rules over time. Every time a human reviewer overrides a flag or catches something the pipeline missed, that’s a signal. Capture those cases, refine your prompts, and the system gets sharper with each client engagement. That’s where the real efficiency gain lives — not in the first run, but in the tenth.

author avatar
promptyze
promptyze
Founder · Editor · Promptowy

Piszę o AI i automatyzacji od 3 lat. Prowadzę promptowy.com.

More →