Claude Error Handling Patterns

Q: What is a circuit breaker and should we implement one for Claude API?

A circuit breaker monitors API call failures. When failures exceed a threshold (e.g., 5 failures in 30 seconds), it 'opens' the circuit and immediately fails new requests without calling the API — preventing cascading failures when the API is experiencing issues. After a cooldown period (e.g., 60 seconds), it allows a test request through; if successful, the circuit 'closes' and normal operation resumes. Implement a circuit breaker for any Claude integration where API unavailability could cause cascading failures in your application. Libraries like 'circuitbreaker' or 'pybreaker' for Python make this straightforward.

Q: How do we handle content policy errors gracefully in production?

Content policy errors (400 status with error type 'invalid_request_error' and message containing 'output_blocked' or similar) are permanent — the specific request will always fail. Do not retry. In production, implement pre-screening to detect likely policy issues before sending to Claude. Log the full request details for review. Return a user-friendly message ('We couldn't process this request — please rephrase or contact support'). For applications with variable user input, implement content classifiers upstream to catch problematic inputs before they reach Claude.

Error Taxonomy: What Can Go Wrong

Building production-grade Claude integrations requires understanding the full taxonomy of errors you'll encounter, because different error types demand fundamentally different handling strategies. The most dangerous mistake in Claude error handling is treating all errors the same way — either retrying everything (wasting resources and confusing users when permanent errors loop endlessly) or giving up immediately on transient errors that would succeed on the next attempt.

Here is the complete error taxonomy from our production deployments:

HTTP Code	Error Type	Retryable?	Action
`400`	Invalid request / Content policy	Permanent	Fix request or content; do not retry
`401`	Authentication error	Permanent	Check API key; alert ops immediately
`403`	Permission denied	Permanent	Check account permissions / contact Anthropic
`404`	Model not found	Permanent	Check model string; update your code
`413`	Request too large	Permanent	Reduce context size; implement chunking
`429`	Rate limit exceeded	Retryable	Exponential backoff + Retry-After header
`500`	Server error	Retryable	Exponential backoff; alert if persistent
`529`	Service overloaded	Retryable	Exponential backoff with longer delays
Timeout	Request timeout	Retryable	Retry once; investigate if recurring

The Anthropic Python SDK exposes these as typed exceptions: anthropic.RateLimitError, anthropic.APIStatusError, anthropic.APITimeoutError, and anthropic.APIConnectionError. Catch specifically rather than with a blanket except Exception — you need to distinguish between retryable and permanent errors in your handling logic.

Experiencing reliability issues with your Claude integration? Our engineering team reviews production error patterns and designs resilience architecture. Free technical assessment.

Get Architecture Review →

Retry Strategies by Error Type

The SDK has built-in retry logic for 429 and 5xx errors, but production systems require additional application-level retry logic and careful configuration:

Rate Limit Errors (429)

Always use the Retry-After response header when present — this tells you exactly how long to wait, which is more accurate than a calculated backoff. If the header is absent, use exponential backoff: wait = base_delay × 2^attempt + jitter where base_delay=1.0 second, jitter = random(0, 0.5), and max_delay=60 seconds. After 5 retries, add the request to a priority retry queue rather than failing immediately — it will likely succeed once the rate limit window resets.

Server Errors (500, 529)

Server errors are typically transient — Anthropic's infrastructure is recovering from a temporary issue. Implement exponential backoff with slightly longer initial delays (2–3 seconds base) to give the server time to recover. If you're consistently seeing 529 (overloaded) errors during business hours, this is a signal to request rate limit increases or shift non-urgent work to the Batch API.

Timeouts

A timeout typically means Claude is generating a very long response, your network has an issue, or the API is under high load. Retry once with the same timeout setting — most timeout retries succeed. If timeouts are recurring for specific request types, investigate whether the expected output is too long for your timeout setting, and increase the timeout for those specific request patterns.

Content Policy (400)

Never retry content policy errors. Log the full request details (without PII if possible) for review. For applications with user-generated input, implement upstream content screening. A useful pattern: on content policy rejection, attempt a reformulated version of the request with safety instructions added — this succeeds for many borderline cases that weren't malicious, just ambiguously phrased.

🛡️

Free White Paper: CTO Guide to Claude API

The complete technical reference covering error handling, resilience patterns, monitoring, and production architecture for enterprise-scale Claude deployments.

Download Free →

Circuit Breakers & Fallbacks

For applications where Claude is in the critical path, a circuit breaker prevents a degraded API from cascading into full application failure. The pattern monitors recent API call outcomes and temporarily short-circuits calls when failure rates exceed a threshold:

Closed state (normal operation): Calls go through to the API. Failures are counted within a rolling window (e.g., 60 seconds).
Open state (tripped): When failures exceed threshold (e.g., 5 failures in 30 seconds), the circuit opens. New requests fail immediately without hitting the API, returning a graceful fallback. This protects your application and prevents compounding load on a struggling API.
Half-open state (testing): After a cooldown period (e.g., 60 seconds), one test request is allowed through. If it succeeds, the circuit closes. If it fails, the circuit remains open for another cooldown period.

Implement fallbacks for each circuit-broken path: for summarisation, return the original document with a note; for classification, return "unclassified" and route to human review; for content generation, use a simpler template-based alternative. The goal is degraded but functional service, not total failure.

When to Use Circuit Breakers

Circuit breakers are essential when Claude is in the synchronous request path for user-facing features. They're less critical for background processing or batch jobs where failures are handled at the job level. Prioritise implementation for: customer-facing chatbots, real-time document processing, and any user-facing AI feature where a loading spinner longer than 5 seconds degrades user experience.

User-Facing Error Handling

Technical error handling is only half the challenge. How errors are communicated to users determines whether an incident becomes a support ticket or goes unnoticed:

Never Surface Technical Errors Directly

"429 Too Many Requests from Anthropic API" is not a user message. "We're experiencing high demand right now — your request is being processed and will complete in approximately 2 minutes" is. Map every error type to a clear, actionable user message that sets expectations without exposing implementation details.

Progressive Disclosure for Long Wait Times

For requests that will take a long time due to retry cycles: show a progress indicator immediately, provide status updates ("Still processing... this is taking longer than usual"), give users a way to cancel if they don't want to wait, and send a completion notification (email or in-app) if processing will take more than 2 minutes.

Graceful Degradation Messages

When Claude is unavailable and you're serving from fallback: "Our AI assistant is temporarily unavailable — we're showing a simplified version of this feature. Full capability will be restored shortly." This is far better than a broken state or a confusing error.

Observability & Alerting

You can't fix what you can't see. Production Claude integrations need comprehensive observability:

Error rate by type: Track 429, 5xx, timeout, and content policy error rates separately. Alert when any error type exceeds 5% of requests in a 5-minute window.
Latency percentiles: Track p50, p95, and p99 latency. Alert when p95 exceeds 2x your normal baseline. Latency spikes often precede 529 errors.
Retry rate: Track how often you're retrying requests. A retry rate above 10% indicates you're consistently hitting limits or experiencing instability — investigate before it becomes an incident.
Token consumption: Track input and output tokens per request by workflow type. Sudden spikes indicate prompt changes that significantly increased token usage.
Cost per request: Track cost per API call by workflow. This is your early warning system for cost overruns and also helps identify optimisation opportunities.

Export all metrics to your observability platform (Datadog, Grafana, CloudWatch). Create a Claude API health dashboard visible to your engineering team, and set up PagerDuty or equivalent alerting for error rates above threshold during business hours.

Building observability for your Claude deployment? We design monitoring architectures as part of every implementation engagement. See what a production-ready Claude setup looks like.

Book Architecture Review →

Frequently Asked Questions

What are the most common Claude API errors in production?

The five most common Claude API errors in production are: (1) 429 Too Many Requests — rate limit exceeded, handle with exponential backoff; (2) 529 Service Overloaded — Anthropic capacity issue, handle like 429; (3) 500/503 Server Errors — transient issues, retry with backoff; (4) Timeout — request exceeded timeout, usually very long output or network issues; (5) Content Policy — permanent error, do not retry. The SDK wraps all in typed exception classes you can catch specifically.

How should we implement retry logic for Claude API calls?

The Anthropic Python SDK has built-in retry logic (default max_retries=2) with exponential backoff and jitter for 429 and 5xx errors. Increase max_retries to 4–5 for background tasks, keep at 2–3 for user-facing requests. Always implement application-level retry logic on top of SDK retries. Never retry content policy errors — these are permanent and the same request will always fail.

What is a circuit breaker and should we implement one for Claude API?

A circuit breaker monitors failures and, when they exceed a threshold, temporarily short-circuits calls to the API — preventing cascading failures. After a cooldown period, it allows a test request through. If successful, normal operation resumes. Implement a circuit breaker for any integration where API unavailability could cause cascading failures in your application — especially user-facing features.

How do we handle content policy errors gracefully in production?

Content policy errors are permanent — do not retry. Implement upstream content classifiers to catch problematic inputs before they reach Claude. Log full request details for review without surfacing them to users. Return a user-friendly message such as "We couldn't process this request — please rephrase or contact support." For variable user input applications, pre-screening significantly reduces policy rejections.

Claude Error Handling Patterns: Production-Grade Resilience 2026

Table of Contents

Error Taxonomy: What Can Go Wrong