Two Approaches to PDF Processing

Enterprises processing PDFs with Claude's API have two primary technical approaches available, each with distinct trade-offs that determine which is right for your use case.

Approach 1: Image Conversion (Most Common)

Convert each PDF page to a PNG image at 150–200 DPI and send them to Claude via the Vision API. This is the most widely deployed approach and gives you maximum control over document pre-processing. You can deskew scanned documents, enhance contrast on poor-quality images, and apply custom pre-processing before sending to Claude. The limitation is the 20-image-per-request cap, which requires chunking strategies for long documents.

Approach 2: Native PDF Support (Files API)

Claude's Files API (in beta) supports direct PDF upload. You upload the PDF once and receive a file_id that you reference in subsequent API requests — Claude processes the document natively without requiring image conversion. This is more efficient for large documents, supports up to 100 pages natively, and reduces the preprocessing pipeline complexity. It works best for machine-generated PDFs; scanned documents may still benefit from image-based pre-processing.

For most enterprise workflows, the image conversion approach remains the default choice due to its stability, flexibility for scanned documents, and well-understood cost model. Use the Files API when you have large, machine-generated PDFs that don't require pre-processing, or when API call efficiency is a priority.

Deploying PDF processing workflows in your enterprise? Our technical team has architected document intelligence for 40+ organisations. Get a free architecture review.

Free Technical Assessment →

Top Enterprise PDF Use Cases

PDF processing is deployed across every department in our client base. The highest-value applications:

Legal: Contract Review & Extraction

Legal teams use Claude to perform first-pass review of incoming contracts, supplier agreements, and NDAs. A well-designed prompt extracts: parties, effective date, term length, payment terms, key obligations, termination rights, governing law, dispute resolution, and any non-standard clauses. For a typical 20-page commercial agreement, Claude processes in 8–15 seconds, producing a structured summary and risk flags that attorneys review in 5 minutes rather than 45. See our Claude for Legal department guide for detailed deployment patterns.

Finance: Annual Report & Financial Statement Analysis

Finance analysts process quarterly earnings, annual reports, and investor presentations — dense, complex PDFs combining narrative text, tables, and charts. Claude reads all elements together, extracting key metrics, identifying period-over-period changes, and generating plain-English summaries. Our finance clients process 20–30 competitor or portfolio company reports in a session that previously took a full working day.

Compliance: Regulatory Document Analysis

Compliance teams use Claude to analyse new regulatory guidance, extract compliance obligations, and map requirements to existing policies. For regulated industries — financial services, healthcare, pharmaceuticals — this is a major time-saver. Claude can process a 200-page regulatory document and produce an executive summary of new obligations in under 2 minutes.

Procurement: RFP & Vendor Document Review

Procurement teams receive RFP responses, vendor proposals, and supplier agreements as PDFs. Claude can evaluate proposals against defined criteria, extract pricing structures, compare terms across multiple vendor documents, and generate comparison matrices — compressing days of procurement analysis into hours.

📑

Free White Paper: Enterprise Claude Implementation Playbook

Covers document processing architectures, prompt engineering for extraction, security considerations, and ROI measurement across 200+ deployments.

Download Free →

Chunking Strategy for Long Documents

Documents exceeding 20 pages require a chunking strategy. Here is the approach we deploy across our client base:

Overlapping Chunk Strategy

Divide the document into chunks of 15–18 pages with a 2–3 page overlap at boundaries. The overlap ensures that concepts or provisions that span a page boundary aren't missed when processing a chunk. For a 60-page contract, you'd process pages 1–18, then 16–33, then 31–48, then 46–60. Each chunk is processed independently, then the results are merged in your application layer.

Hierarchical Strategy (Long Documents)

For very long documents (100+ pages), use a two-pass approach. Pass 1 generates a summary of each 15–20 page chunk. Pass 2 sends all chunk summaries to Claude in a single request for synthesis and final output. This reduces token consumption significantly on very long documents and keeps each API call within comfortable context limits.

Section-Aware Strategy (Structured Documents)

For documents with clear section structure (contracts, annual reports, standards documents), split along section boundaries rather than fixed page counts. Use Claude to first identify the table of contents or section structure, then process each section as an independent chunk. This produces the most coherent results because each chunk contains semantically complete content rather than arbitrary page cuts.

The two most common high-value PDF processing workflows we deploy deserve deeper coverage:

Contract Review Pipeline

The complete enterprise contract review pipeline has five stages: (1) Document ingestion from email or file system → (2) Type classification (what kind of contract is this?) → (3) Full text extraction with structured output schema → (4) Risk flag identification against your playbook → (5) Summary generation for attorney review. Each stage can use a separate Claude call optimised for its specific task, or you can combine stages 3–5 in a single comprehensive prompt for simpler contracts. Our clients typically see 60–70% reduction in attorney time on standard contracts.

Financial Report Analysis Pipeline

Financial report analysis follows a similar staged approach: (1) Document type identification → (2) Financial statement extraction (income statement, balance sheet, cash flow in structured JSON) → (3) Key metrics identification and period comparison → (4) Narrative section summarisation → (5) Executive brief generation. For investment research workflows, Claude can then compare the brief against a standardised analysis template, flag deviations, and generate follow-up questions for management calls.

Performance & Cost Optimisation

PDF processing costs are primarily driven by token consumption. Each image consumes tokens proportional to its content density. To optimise:

  • Use 150 DPI, not higher: 150 DPI is sufficient for virtually all document types. 300 DPI does not meaningfully improve accuracy but roughly doubles image size and token consumption.
  • Crop pages to content area: Many documents have significant white space margins. Crop to the content area before conversion to reduce image dimensions and token consumption by 20–40%.
  • Use Batch API for async workloads: Any PDF processing that doesn't need real-time results should use the Batch API — 50% cost reduction with 24-hour SLA. AP processing, overnight document ingestion, and compliance review queues are ideal candidates.
  • Cache common documents: For frequently referenced documents (standard agreements, regulatory texts, policy documents), use prompt caching. If the same document appears in many requests, caching can reduce costs 60–80% for that content.
  • Pre-screen document type: A fast, cheap Claude Haiku call to classify document type allows you to route complex contracts to Sonnet and simple forms to Haiku — significant cost optimisation without accuracy degradation.

Processing high volumes of PDFs? Our team can design a cost-optimised architecture with accuracy benchmarks for your specific document types. Free 90-minute session.

Book Architecture Review →

Frequently Asked Questions

Can Claude process PDFs directly through the API?
Claude supports native PDF processing via the Files API (beta) — you upload the PDF once and reference it by file ID. For standard integration, the most common approach is converting PDF pages to PNG images and sending them via the Vision API. Both work well; native PDF support is more efficient for large machine-generated documents while image conversion gives more control over pre-processing for scanned documents.
How many pages can Claude process in one API call?
With the image approach, you can send up to 20 images per request within Claude's 200K token context window. A typical 150 DPI PDF page converts to approximately 1,000–2,000 tokens. For a standard contract of 15–30 pages, a single API call can process the entire document. For longer documents, use overlapping chunks of 15–18 pages with 2–3 page overlap at boundaries.
What types of enterprise PDFs work best with Claude?
Claude performs excellently on contracts and legal agreements, financial reports, technical specifications, research papers, compliance documents, and RFPs. Performance is strong on machine-generated PDFs and good on high-quality scans. Handwritten documents and very low-resolution scans (below 100 DPI) show reduced accuracy. Pre-process scanned documents with deskewing and contrast enhancement for best results.
How do we handle confidential PDFs securely with Claude API?
Enterprise Claude API does not use your documents for model training under the DPA. Use server-side API calls only, implement strict access logging, consider redacting known PII for non-essential use cases, and use short-lived file IDs rather than persistent storage. For HIPAA-covered documents, ensure your enterprise agreement includes a BAA with Anthropic.

Related Articles