Error Taxonomy: What Can Go Wrong
Building production-grade Claude integrations requires understanding the full taxonomy of errors you'll encounter, because different error types demand fundamentally different handling strategies. The most dangerous mistake in Claude error handling is treating all errors the same way — either retrying everything (wasting resources and confusing users when permanent errors loop endlessly) or giving up immediately on transient errors that would succeed on the next attempt.
Here is the complete error taxonomy from our production deployments:
| HTTP Code | Error Type | Retryable? | Action |
|---|---|---|---|
400 | Invalid request / Content policy | Permanent | Fix request or content; do not retry |
401 | Authentication error | Permanent | Check API key; alert ops immediately |
403 | Permission denied | Permanent | Check account permissions / contact Anthropic |
404 | Model not found | Permanent | Check model string; update your code |
413 | Request too large | Permanent | Reduce context size; implement chunking |
429 | Rate limit exceeded | Retryable | Exponential backoff + Retry-After header |
500 | Server error | Retryable | Exponential backoff; alert if persistent |
529 | Service overloaded | Retryable | Exponential backoff with longer delays |
| Timeout | Request timeout | Retryable | Retry once; investigate if recurring |
The Anthropic Python SDK exposes these as typed exceptions: anthropic.RateLimitError, anthropic.APIStatusError, anthropic.APITimeoutError, and anthropic.APIConnectionError. Catch specifically rather than with a blanket except Exception — you need to distinguish between retryable and permanent errors in your handling logic.
Experiencing reliability issues with your Claude integration? Our engineering team reviews production error patterns and designs resilience architecture. Free technical assessment.
Get Architecture Review →Retry Strategies by Error Type
The SDK has built-in retry logic for 429 and 5xx errors, but production systems require additional application-level retry logic and careful configuration:
Rate Limit Errors (429)
Always use the Retry-After response header when present — this tells you exactly how long to wait, which is more accurate than a calculated backoff. If the header is absent, use exponential backoff: wait = base_delay × 2^attempt + jitter where base_delay=1.0 second, jitter = random(0, 0.5), and max_delay=60 seconds. After 5 retries, add the request to a priority retry queue rather than failing immediately — it will likely succeed once the rate limit window resets.
Server Errors (500, 529)
Server errors are typically transient — Anthropic's infrastructure is recovering from a temporary issue. Implement exponential backoff with slightly longer initial delays (2–3 seconds base) to give the server time to recover. If you're consistently seeing 529 (overloaded) errors during business hours, this is a signal to request rate limit increases or shift non-urgent work to the Batch API.
Timeouts
A timeout typically means Claude is generating a very long response, your network has an issue, or the API is under high load. Retry once with the same timeout setting — most timeout retries succeed. If timeouts are recurring for specific request types, investigate whether the expected output is too long for your timeout setting, and increase the timeout for those specific request patterns.
Content Policy (400)
Never retry content policy errors. Log the full request details (without PII if possible) for review. For applications with user-generated input, implement upstream content screening. A useful pattern: on content policy rejection, attempt a reformulated version of the request with safety instructions added — this succeeds for many borderline cases that weren't malicious, just ambiguously phrased.
Free White Paper: CTO Guide to Claude API
The complete technical reference covering error handling, resilience patterns, monitoring, and production architecture for enterprise-scale Claude deployments.
Download Free →Circuit Breakers & Fallbacks
For applications where Claude is in the critical path, a circuit breaker prevents a degraded API from cascading into full application failure. The pattern monitors recent API call outcomes and temporarily short-circuits calls when failure rates exceed a threshold:
- Closed state (normal operation): Calls go through to the API. Failures are counted within a rolling window (e.g., 60 seconds).
- Open state (tripped): When failures exceed threshold (e.g., 5 failures in 30 seconds), the circuit opens. New requests fail immediately without hitting the API, returning a graceful fallback. This protects your application and prevents compounding load on a struggling API.
- Half-open state (testing): After a cooldown period (e.g., 60 seconds), one test request is allowed through. If it succeeds, the circuit closes. If it fails, the circuit remains open for another cooldown period.
Implement fallbacks for each circuit-broken path: for summarisation, return the original document with a note; for classification, return "unclassified" and route to human review; for content generation, use a simpler template-based alternative. The goal is degraded but functional service, not total failure.
When to Use Circuit Breakers
Circuit breakers are essential when Claude is in the synchronous request path for user-facing features. They're less critical for background processing or batch jobs where failures are handled at the job level. Prioritise implementation for: customer-facing chatbots, real-time document processing, and any user-facing AI feature where a loading spinner longer than 5 seconds degrades user experience.
User-Facing Error Handling
Technical error handling is only half the challenge. How errors are communicated to users determines whether an incident becomes a support ticket or goes unnoticed:
Never Surface Technical Errors Directly
"429 Too Many Requests from Anthropic API" is not a user message. "We're experiencing high demand right now — your request is being processed and will complete in approximately 2 minutes" is. Map every error type to a clear, actionable user message that sets expectations without exposing implementation details.
Progressive Disclosure for Long Wait Times
For requests that will take a long time due to retry cycles: show a progress indicator immediately, provide status updates ("Still processing... this is taking longer than usual"), give users a way to cancel if they don't want to wait, and send a completion notification (email or in-app) if processing will take more than 2 minutes.
Graceful Degradation Messages
When Claude is unavailable and you're serving from fallback: "Our AI assistant is temporarily unavailable — we're showing a simplified version of this feature. Full capability will be restored shortly." This is far better than a broken state or a confusing error.
Observability & Alerting
You can't fix what you can't see. Production Claude integrations need comprehensive observability:
- Error rate by type: Track 429, 5xx, timeout, and content policy error rates separately. Alert when any error type exceeds 5% of requests in a 5-minute window.
- Latency percentiles: Track p50, p95, and p99 latency. Alert when p95 exceeds 2x your normal baseline. Latency spikes often precede 529 errors.
- Retry rate: Track how often you're retrying requests. A retry rate above 10% indicates you're consistently hitting limits or experiencing instability — investigate before it becomes an incident.
- Token consumption: Track input and output tokens per request by workflow type. Sudden spikes indicate prompt changes that significantly increased token usage.
- Cost per request: Track cost per API call by workflow. This is your early warning system for cost overruns and also helps identify optimisation opportunities.
Export all metrics to your observability platform (Datadog, Grafana, CloudWatch). Create a Claude API health dashboard visible to your engineering team, and set up PagerDuty or equivalent alerting for error rates above threshold during business hours.
Building observability for your Claude deployment? We design monitoring architectures as part of every implementation engagement. See what a production-ready Claude setup looks like.
Book Architecture Review →