Skip to content
Claude

How to Build a Legal Document Summarizer with Claude 3.5 Haiku and the Batch API

Build a legal document summarizer using Claude 3.5 Haiku and Anthropic’s Batch API. Real pricing, copy-paste prompts, and Python code included.

11 min read
How to Build a Legal Document Summarizer with Claude 3.5 Haiku and the Batch API

Legal documents are long, dense, and written by people who apparently get paid by the semicolon. Running them through an LLM one by one is fine for the occasional NDA, but if you’re processing dozens of contracts a week, the latency and cost add up fast. Claude 3.5 Haiku — Anthropic’s smallest and fastest model, released in October 2024 — is built exactly for this kind of high-volume, repetitive workload. Pair it with Anthropic’s Batch API, which cuts pricing by 50%, and you have a document pipeline that’s genuinely cheap to run.

This tutorial walks through the full setup: a Python-based pipeline that takes raw legal documents, sends them to Claude 3.5 Haiku via the Batch API, and returns structured summaries with key clauses, obligations, and risk flags. We’ll use real pricing numbers (not the fantasy figures that sometimes circulate), and by the end you’ll have something production-ready — or at least close enough that your legal team might actually trust it.

What You’ll Build

A Python script that accepts a folder of plain-text or PDF legal documents, packages them into batch API requests, submits the batch to Anthropic, polls for completion, and writes structured JSON summaries to disk. Each summary will include: a plain-English overview, key parties and obligations, important dates and deadlines, unusual or high-risk clauses, and a risk score from 1 to 5. The whole thing runs asynchronously, so you can submit 200 contracts at 9 AM and collect the results after lunch.

Requirements

You need Python 3.9 or later, an Anthropic API key with Batch API access (standard on all paid tiers), and the anthropic Python SDK version 0.25 or later. For PDF support, install pypdf2 or pdfplumber — the latter handles complex layouts better. Install everything with pip install anthropic pdfplumber python-dotenv. Store your API key in a .env file as ANTHROPIC_API_KEY=your_key_here and never hardcode it.

Note 💡

The Batch API is designed for async workloads. Batches typically complete within a few minutes for small jobs and up to 24 hours for large ones. If you need a response in under 10 seconds, use the standard Messages API instead. Batch processing trades speed for cost savings — 50% off is significant, but it’s not the right tool for a live chatbot.

Step 1: Extract Text from Your Documents

Before you can summarize anything, you need clean text. PDFs from legal firms range from beautifully structured to completely unreadable, so use pdfplumber for maximum compatibility. Here’s a minimal extraction function that handles both plain text files and PDFs:

import pdfplumber
import os

def extract_text(file_path: str) -> str:
    if file_path.endswith(".pdf"):
        with pdfplumber.open(file_path) as pdf:
            pages = [page.extract_text() or "" for page in pdf.pages]
            return "

".join(pages).strip()
    else:
        with open(file_path, "r", encoding="utf-8") as f:
            return f.read().strip()

def load_documents(folder: str) -> dict:
    docs = {}
    for fname in os.listdir(folder):
        if fname.endswith((".pdf", ".txt")):
            path = os.path.join(folder, fname)
            docs[fname] = extract_text(path)
    return docs

This gives you a dictionary mapping filenames to raw text strings. The or "" fallback on extract_text() handles scanned PDFs that have no extractable text layer — you’ll want to add OCR (via pytesseract) if your document set includes scans.

Warning ⚠️

Scanned PDFs without a text layer will return empty strings. If your extraction returns fewer than 100 characters for a document that’s clearly several pages long, it’s a scanned image. Flag these separately and route them through an OCR step before feeding them to the API — otherwise Claude will summarize nothing and you’ll pay for it anyway.

Step 2: Write Your Summarization Prompt

The prompt is where most people either over-engineer or under-specify. For legal documents, you want a structured output format that’s consistent across hundreds of documents — which means telling Claude exactly what fields to return and in what format. Here’s the core system prompt:

You are a legal document analyst. Your job is to read legal documents and produce structured summaries for non-lawyers. Be precise, use plain English, and flag anything unusual or high-risk.

Always respond with valid JSON in exactly this format:
{
  "document_type": "string (e.g. NDA, Service Agreement, Employment Contract)",
  "parties": ["list of party names and roles"],
  "summary": "2-3 sentence plain-English overview of what this document does",
  "key_obligations": ["list of main things each party must do"],
  "important_dates": ["list of key dates, deadlines, or durations"],
  "termination_conditions": "how and when this agreement can end",
  "unusual_clauses": ["list any clauses that are non-standard, aggressive, or unusual"],
  "risk_score": "integer from 1 (low risk) to 5 (high risk) for the receiving party",
  "risk_reasoning": "one sentence explaining the risk score"
}

And the user-turn prompt that wraps each document:

Please analyze the following legal document and return a structured JSON summary.

DOCUMENT:
{{DOCUMENT_TEXT}}

Remember: respond only with the JSON object, no additional text before or after.

The instruction to return only JSON — with no preamble — is important for batch processing. When you’re parsing 200 responses programmatically, a single “Sure, here’s the summary!” prefix will break your JSON parser. Claude 3.5 Haiku follows this instruction reliably, but it’s worth validating outputs anyway (more on that in Step 4).

Pro tip ✅

Add a concrete example of the JSON structure you want inside the system prompt. Claude mirrors examples very faithfully. If your risk score should always be an integer (not a string like “3/5”), show that in the example. One sample output in the system prompt is worth a paragraph of instructions.

Step 3: Build and Submit the Batch

The Batch API accepts a list of request objects, each with a unique custom_id you assign for tracking. Here’s how to package your documents and submit:

import anthropic
import json
from dotenv import load_dotenv
import os

load_dotenv()
client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

SYSTEM_PROMPT = """You are a legal document analyst. Your job is to read legal documents and produce structured summaries for non-lawyers. Be precise, use plain English, and flag anything unusual or high-risk.

Always respond with valid JSON in exactly this format:
{
  "document_type": "string",
  "parties": ["list of party names and roles"],
  "summary": "2-3 sentence plain-English overview",
  "key_obligations": ["list of main obligations"],
  "important_dates": ["list of key dates and deadlines"],
  "termination_conditions": "how and when the agreement can end",
  "unusual_clauses": ["non-standard or aggressive clauses"],
  "risk_score": 1,
  "risk_reasoning": "one sentence explaining the score"
}"""

def build_batch_requests(documents: dict) -> list:
    requests = []
    for doc_id, text in documents.items():
        # Truncate if over ~180k tokens to stay safe (rough estimate: 1 token ≈ 4 chars)
        max_chars = 720000
        truncated = text[:max_chars] if len(text) > max_chars else text

        requests.append({
            "custom_id": doc_id,
            "params": {
                "model": "claude-haiku-3-5-20241022",
                "max_tokens": 1024,
                "system": SYSTEM_PROMPT,
                "messages": [
                    {
                        "role": "user",
                        "content": f"Please analyze the following legal document and return a structured JSON summary.

DOCUMENT:
{truncated}

Respond only with the JSON object."
                    }
                ]
            }
        })
    return requests

def submit_batch(requests: list) -> str:
    batch = client.beta.messages.batches.create(requests=requests)
    print(f"Batch submitted: {batch.id}")
    print(f"Status: {batch.processing_status}")
    return batch.id

Note the model identifier: claude-haiku-3-5-20241022. Check Anthropic’s API documentation for the exact string — model identifiers sometimes include date suffixes and getting it wrong produces an immediate 400 error.

Pro tip ✅

Set max_tokens to 1024 for structured JSON summaries. Legal summaries don’t need to be exhaustive — they need to be scannable. Keeping output tokens low also keeps your costs down, since output tokens cost 5x more than input tokens with Claude 3.5 Haiku.

Step 4: Poll for Results and Parse Outputs

Batch jobs don’t return immediately. You need to poll the API until the batch status is ended, then retrieve and parse the results:

import time

def wait_for_batch(batch_id: str, poll_interval: int = 60) -> None:
    print(f"Waiting for batch {batch_id} to complete...")
    while True:
        batch = client.beta.messages.batches.retrieve(batch_id)
        status = batch.processing_status
        print(f"Status: {status} | "
              f"Succeeded: {batch.request_counts.succeeded} | "
              f"Errored: {batch.request_counts.errored} | "
              f"Processing: {batch.request_counts.processing}")

        if status == "ended":
            print("Batch complete.")
            break
        time.sleep(poll_interval)

def collect_results(batch_id: str) -> dict:
    results = {}
    for result in client.beta.messages.batches.results(batch_id):
        doc_id = result.custom_id
        if result.result.type == "succeeded":
            raw_text = result.result.message.content[0].text
            try:
                parsed = json.loads(raw_text)
                results[doc_id] = {"status": "ok", "summary": parsed}
            except json.JSONDecodeError as e:
                results[doc_id] = {"status": "parse_error", "raw": raw_text, "error": str(e)}
        else:
            results[doc_id] = {"status": "api_error", "error": result.result.error.type}
    return results

The try/except around JSON parsing is non-negotiable. Even with explicit JSON-only instructions, edge cases happen: very short documents, documents in unexpected languages, or documents that trigger safety filters will all produce non-JSON responses. Log the raw output and move on — don’t let one bad parse kill your whole pipeline.

Pro tip ✅

After collecting results, run a quick validation pass: check that risk_score is an integer between 1 and 5, that parties is a non-empty list, and that summary is at least 50 characters. Automated validation catches format drift before it hits whatever system is consuming these summaries.

Step 5: Specialized Prompts for Different Document Types

A generic summarization prompt works, but you’ll get sharper results by adapting the prompt to the document type. Here are three ready-to-use variants:

For NDAs:

You are analyzing a Non-Disclosure Agreement. Pay special attention to:
- The definition of "Confidential Information" — is it broad or narrow?
- The duration of confidentiality obligations (does it survive termination?)
- Permitted disclosures and carve-outs
- Mutual vs. one-sided obligations
- Any non-compete or non-solicitation clauses hidden in the NDA

Return JSON with the standard fields, plus an "nda_flags" array listing any clauses that favor one party disproportionately.

For SaaS/Service Agreements:

You are analyzing a SaaS or Service Agreement. Focus on:
- SLA commitments and what happens when they're missed (credits? termination rights?)
- Data ownership and data processing terms
- Limitation of liability caps — are they reasonable relative to contract value?
- Auto-renewal clauses and notice periods required to cancel
- Acceptable use policy restrictions that could affect the customer's business

Flag any terms that are significantly more aggressive than industry standard.

For Employment Contracts:

You are analyzing an Employment Contract from the perspective of the employee. Identify:
- Compensation structure (base, variable, equity — vesting schedule if present)
- Non-compete scope: geography, duration, and industry breadth
- IP assignment clauses — do they extend to work done outside company time?
- Termination conditions and severance entitlements
- Any unusual restrictions on future employment

Risk score the contract from 1 (employee-friendly) to 5 (heavily employer-favoring).

Pro tip ✅

Auto-detect document type before routing to the right prompt. A simple classifier prompt — “Is this document an NDA, Service Agreement, Employment Contract, or Other? Respond with one word.” — costs almost nothing and makes your pipeline meaningfully smarter. Run it as a separate, cheap API call before building the batch.

Step 6: Putting It All Together

Here’s the main script that ties all the pieces together:

def main(docs_folder: str, output_file: str = "summaries.json"):
    print("Loading documents...")
    documents = load_documents(docs_folder)
    print(f"Found {len(documents)} documents.")

    print("Building batch requests...")
    requests = build_batch_requests(documents)

    print("Submitting batch...")
    batch_id = submit_batch(requests)

    wait_for_batch(batch_id, poll_interval=60)

    print("Collecting results...")
    results = collect_results(batch_id)

    ok_count = sum(1 for r in results.values() if r["status"] == "ok")
    error_count = len(results) - ok_count
    print(f"Done. Successful: {ok_count} | Errors: {error_count}")

    with open(output_file, "w") as f:
        json.dump(results, f, indent=2)
    print(f"Results saved to {output_file}")

if __name__ == "__main__":
    main("./contracts", "summaries.json")

Run it with python summarizer.py and point it at a folder of contracts. For 50 typical legal documents (averaging around 2,000 tokens each), you’re looking at roughly 100,000 input tokens and maybe 50,000 output tokens. With Batch API pricing at $0.40/million input and $2.00/million output, that’s about $0.04 for input and $0.10 for output — so $0.14 total, or under $0.003 per document. That’s not the $0.001/page figure that sometimes gets quoted, but it’s still remarkably cheap for what you’re getting.

Cost Breakdown: What You’ll Actually Pay

Let’s be concrete. Claude 3.5 Haiku standard pricing is $0.80 per million input tokens and $4.00 per million output tokens. The Batch API cuts both figures by 50%, bringing you to $0.40 input and $2.00 output per million tokens. A typical 10-page legal contract runs roughly 3,000-4,000 tokens. A summary response with full JSON fields runs roughly 400-600 tokens. So per document: approximately $0.0016 in input costs and $0.001 in output costs, totaling around $0.0026 per contract through the Batch API. Scale that to 1,000 contracts: about $2.60. For 10,000 contracts: about $26. A paralegal reviewing those manually would bill more than that per hour for a handful of documents — the math isn’t subtle.

Note 💡

These estimates assume average-length contracts. Very long agreements — 50+ pages, 20,000+ tokens — will cost proportionally more. If your document set includes long-form agreements, calculate token counts before submitting and adjust your budget accordingly. The anthropic SDK doesn’t have a built-in token counter, but you can estimate at roughly 1 token per 4 characters of English text.

Avoid 🚫

Don’t send entire document bundles as a single input. If someone emails you a 300-page contract package as one PDF, split it into individual agreements before processing. Summarizing a monolithic document produces worse outputs and costs more — Claude’s attention is finite, and a 150,000-token input will yield a less precise summary than three separate 50,000-token inputs.

What This Means for Your Workflow

The pipeline described here is genuinely production-viable. It handles batch submission, error recovery, structured output validation, and per-document type prompt specialization. What it doesn’t handle — and what you’d need to add for a serious deployment — is authentication and access control around the summaries, a UI for non-technical users to upload documents and view results, and a human-review step for anything that scores 4 or 5 on risk. That last point matters: Claude 3.5 Haiku is fast and cheap, but it’s not a lawyer. Treat its output as a first-pass triage tool that tells you which documents need human attention, not as legal advice you act on directly. Used that way, it’s genuinely useful. Used as a replacement for legal review, it’s a liability waiting to happen. The goal is to stop your actual lawyers from wasting time on boilerplate NDAs so they can focus on the agreements that actually need their expertise — and at $0.003 per document, you can afford to run everything through the pipeline and only escalate what flags red.

author avatar
promptyze

promptyze

ADMINISTRATOR