Skip to main content

Command Palette

Search for a command to run...

The Bulkhead Pattern in ASP.NET Core: When to Use It and How

Updated
โ€ข12 min read

When a single slow downstream dependency can take your entire API offline, you have a cascade failure problem โ€” and the Bulkhead Pattern is one of the most effective architectural tools for preventing it. In distributed ASP.NET Core applications, where a single request fan-out may touch five or six external services, uncontrolled resource sharing means that a saturated payment processor or a sluggish inventory service can exhaust the shared thread pool and drag every other endpoint down with it. The fix is isolation โ€” and that is exactly what the bulkhead pattern delivers. If you want to see how isolation, retry, and circuit breaking fit together inside a production ASP.NET Core API โ€” with real resilience pipelines and annotated source code โ€” that complete picture is available on Patreon, where every concept from this series has a working codebase behind it.

Understanding bulkhead isolation in isolation is useful โ€” seeing it applied alongside rate limiting, circuit breaking, and timeout policies inside a complete production API is what makes it click. That is exactly what Chapter 10 of the Zero to Production course covers, with source code you can run immediately.

ASP.NET Core Web API: Zero to Production

What Problem Does the Bulkhead Pattern Solve?

The name comes from ship design. A bulkhead is a partition that separates one compartment from another, so that a hull breach in one section does not flood the entire vessel. In software, the same principle applies: if one part of your system is failing or degrading, you prevent that failure from consuming the resources needed by healthy parts.

In a traditional ASP.NET Core API without isolation, all downstream HTTP calls share the same pool of threads and connections. When ExternalPaymentsService starts responding in 30 seconds instead of 300 milliseconds, incoming requests queue up waiting for responses. Within seconds, the thread pool is saturated. Requests to /products, /users, and /search โ€” none of which touch the payments service at all โ€” begin timing out because there are no threads left to process them.

The bulkhead pattern solves this by giving each downstream dependency its own dedicated pool of execution resources โ€” threads, concurrent requests, or connection slots โ€” so that degradation in one service is contained to that service's compartment.

How Is Bulkhead Isolation Different From Circuit Breaking?

This is a question architects ask frequently, and the answer matters for deciding which pattern (or combination) to apply.

A circuit breaker monitors the failure rate of calls to a dependency and trips open when failures exceed a threshold. Once open, it fails fast โ€” calls return immediately with an error rather than waiting. This protects against latency amplification when a dependency is completely down.

A bulkhead limits concurrency โ€” it caps how many simultaneous calls can be in-flight to a given dependency at any moment. This protects against resource exhaustion when a dependency is slow but not completely down. Slow responses still count as occupied slots, which is precisely what creates the cascade failure in the first place.

The two patterns are complementary. A circuit breaker kicks in when failure rate climbs; a bulkhead kicks in when concurrency climbs. Most production resilience stacks deploy both together: bulkhead limits parallel load, circuit breaker stops the calls entirely when the service goes dark.

ASP.NET Core's request timeout strategies form the third leg of this stool โ€” a timeout prevents a single slow call from occupying a bulkhead slot indefinitely.

When Should You Use the Bulkhead Pattern?

The bulkhead pattern is appropriate when all of the following are true:

Your service makes calls to multiple independent downstream dependencies. If your API only talks to one external service, bulkheads add operational complexity without much gain. The pattern's value multiplies with the number of dependencies.

Dependencies have different criticality tiers. Not all dependencies are equal. A failed payment service should not prevent product catalog reads. Bulkheads let you explicitly model this by allocating more concurrency slots to critical paths and fewer to non-critical ones.

Slow responses are more likely than hard failures. In cloud-hosted microservice environments, flapping services and latency spikes are far more common than clean outages. Circuit breakers alone do not protect against slow partial degradation โ€” bulkheads do.

You are running under sustained concurrent load. Bulkheads are a high-traffic concern. At five requests per second, cascade failure through thread pool exhaustion is unlikely. At 500 requests per second with a slow dependency, it is a near certainty without isolation.

When Should You Skip the Bulkhead Pattern?

Bulkheads are not free. Before reaching for them, consider whether the trade-offs make sense:

  • Simple CRUD APIs with a single database. Adding per-operation concurrency limits to a database-backed CRUD API adds complexity without meaningful protection. Use connection pool tuning instead.
  • Synchronous monolithic applications. Bulkhead isolation was designed for distributed systems where multiple external services compete for shared resources. In a monolith that does not fan out over the network, the pattern's purpose dissolves.
  • Low-throughput internal tools. If your API processes tens of requests per minute, cascade failures through thread exhaustion are not a realistic threat.
  • When you have not yet measured. Do not add bulkheads speculatively. Add them in response to a concrete analysis of which dependencies pose cascade-failure risk under load.

Implementing the Bulkhead Pattern in ASP.NET Core with Polly v8

Polly v8 (shipped with Microsoft.Extensions.Resilience) replaced the old BulkheadPolicy with a ConcurrencyLimiterStrategy inside the resilience pipeline builder. This integrates cleanly with IHttpClientFactory and the broader resilience pipeline concept.

The modern approach uses AddResilienceHandler to attach a named pipeline to a typed HttpClient. The key strategy is ConcurrencyLimiter, which maps directly to the bulkhead concept: it limits how many concurrent executions are permitted and how many can queue before requests are rejected outright.

services.AddHttpClient<IPaymentsClient, PaymentsClient>()
    .AddResilienceHandler("payments-pipeline", builder =>
    {
        builder.AddConcurrencyLimiter(
            permitLimit: 10,
            queueLimit: 5);
    });

A few things worth noting here. permitLimit is the bulkhead slot count โ€” the maximum number of simultaneous in-flight calls allowed. queueLimit controls how many additional requests may wait when all slots are occupied; once this queue fills, subsequent requests are rejected immediately with a BrokenCircuitException-equivalent rather than allowed to pile up indefinitely.

Sizing these values correctly requires load profiling. Start with your 99th-percentile concurrent call count to that dependency under normal load, then add a safety margin of 20โ€“30%. Set queueLimit conservatively โ€” a short queue allows fast rejection under degradation, which is usually preferable to a large queue that just delays the cascade.

Bulkhead per Dependency: The Right Model

The most common implementation mistake is applying a single shared bulkhead across all outgoing HTTP calls. This defeats the purpose. If your API calls three downstream services โ€” PaymentsService, InventoryService, and NotificationsService โ€” each should have its own independently sized bulkhead:

services.AddHttpClient<IPaymentsClient, PaymentsClient>()
    .AddResilienceHandler("payments", b => b.AddConcurrencyLimiter(permitLimit: 10, queueLimit: 5));

services.AddHttpClient<IInventoryClient, InventoryClient>()
    .AddResilienceHandler("inventory", b => b.AddConcurrencyLimiter(permitLimit: 20, queueLimit: 10));

services.AddHttpClient<INotificationsClient, NotificationsClient>()
    .AddResilienceHandler("notifications", b => b.AddConcurrencyLimiter(permitLimit: 5, queueLimit: 2));

This models the real-world criticality difference: inventory reads need more concurrency than fire-and-forget notifications. Payments get a moderate allocation with tight queueing because slow payment responses are already handled by a timeout policy further up the pipeline.

For a deeper look at how typed HttpClient registration and named clients interact with these resilience policies, the IHttpClientFactory and typed client comparison guide has the full breakdown.

Combining Bulkhead With Circuit Breaker and Retry

In practice, bulkheads work within a broader resilience pipeline. The canonical Polly v8 combination for an outgoing HTTP call is:

  1. Timeout โ€” fail fast after a maximum wait
  2. Retry โ€” retry transient failures with exponential backoff and jitter
  3. Circuit breaker โ€” trip open when failure rate exceeds threshold
  4. Concurrency limiter (bulkhead) โ€” cap simultaneous in-flight calls

The order matters. The bulkhead sits outermost โ€” it gates entry into the pipeline. If all slots are occupied, the request never reaches retry or circuit breaker logic; it is rejected immediately. Timeout sits innermost โ€” it applies per individual attempt.

AddStandardResilienceHandler() wires this combination up automatically with sensible defaults. Use it as the baseline and add explicit concurrency limiter configuration only when you need per-dependency slot sizing.

Monitoring Bulkhead Effectiveness

Bulkheads without observability are blind spots. The metrics you want to track per dependency:

  • Concurrency limiter rejection count โ€” the rate at which requests hit the queueLimit ceiling and are rejected. A spike here signals that either the slot count is too low or the downstream dependency is genuinely degrading.
  • Queue wait time โ€” how long requests spend waiting in the queue before executing. High queue wait combined with low rejection count suggests the dependency is slow, not unavailable.
  • Per-client error rate โ€” track error rates per named HttpClient to see which dependency is the source of degradation.

OpenTelemetry instrumentation via AddOpenTelemetry() captures Polly metrics automatically when Microsoft.Extensions.Resilience.Telemetry is configured. The metrics surface under the resilience.polly namespace and integrate with Prometheus, Azure Monitor, and Grafana without additional instrumentation code.

Decision Matrix: Bulkhead Pattern in ASP.NET Core

Condition Recommendation
Multi-dependency API, high traffic Use bulkheads โ€” essential for cascade failure prevention
Single downstream dependency Skip โ€” connection pool tuning is sufficient
Mostly fast downstream calls (< 50ms p99) Low priority โ€” monitor, add if latency grows
Mixed criticality tier dependencies Use bulkheads โ€” differentiated slot allocation pays off
Already using circuit breakers only Add bulkheads โ€” they cover the slow-but-not-down scenario circuit breakers miss
Monolith without external HTTP calls Skip โ€” pattern does not apply
Low-throughput internal API (< 50 rps) Skip โ€” cascade risk is minimal at this load

Anti-Patterns to Avoid

Shared bulkhead across all HttpClients. Already covered, but worth restating: one bulkhead for all downstream services means a slow dependency still exhausts the shared pool. The whole point is per-dependency isolation.

Overestimating permitLimit. A bulkhead set at 500 concurrent calls to a dependency that realistically handles 20 is a bulkhead in name only. Size based on your actual load profile, not theoretical maximums.

Ignoring queue rejections in production. When queueLimit is reached and requests are rejected, that is a signal โ€” not just background noise. Set up alerts on rejection count so you can respond before the cascade reaches users.

Using bulkheads instead of fixing the root cause. A bulkhead contains the blast radius of a slow dependency; it does not fix the dependency. If PaymentsService is consistently slow, the bulkhead buys you time to investigate but is not a permanent substitute for fixing the upstream issue.

Configuring values once and never revisiting them. Load profiles change. A slot count appropriate for launch-day traffic may be dangerously low six months later. Review bulkhead sizing as part of quarterly capacity planning.


โ˜• Prefer a one-time tip? Buy us a coffee โ€” every bit helps keep the content coming!

FAQ

What is the bulkhead pattern in ASP.NET Core? The bulkhead pattern limits the number of simultaneous calls to a specific downstream dependency by allocating a fixed pool of execution slots to it. In ASP.NET Core, this is implemented using Polly's ConcurrencyLimiter strategy within a resilience pipeline attached to a named or typed HttpClient. When the slot pool is exhausted, additional requests either queue briefly or are rejected immediately โ€” preventing resource exhaustion from spreading across the rest of the application.

How is the bulkhead pattern different from a circuit breaker in .NET? A circuit breaker responds to failure rate โ€” it trips open when a percentage of calls start failing, stopping further calls to the dependency until it recovers. A bulkhead responds to concurrency โ€” it caps how many calls can be in-flight simultaneously. In practice, circuit breakers protect against complete outages; bulkheads protect against slow degradation. Both patterns are complementary and are routinely deployed together in the same Polly resilience pipeline.

What is the difference between permitLimit and queueLimit in Polly's ConcurrencyLimiter? permitLimit is the maximum number of concurrent executions allowed at any moment โ€” the bulkhead slot count. queueLimit controls how many additional requests may wait in a queue when all slots are occupied. Once the queue is also full, subsequent requests are rejected immediately. A tight queueLimit (2โ€“5) is usually preferable to a large one, because it fails fast under severe degradation rather than accumulating a long queue of requests that will all eventually fail anyway.

When should I not use the bulkhead pattern in an ASP.NET Core API? Skip bulkheads when your API talks to a single backend, operates at low traffic volume (under ~50 requests per second), uses a monolithic architecture without external HTTP fan-out, or when you have not profiled which dependencies pose a real cascade-failure risk. Adding bulkheads speculatively introduces operational complexity without corresponding benefit. Introduce them in response to concrete evidence โ€” a specific dependency that is slow under load and at risk of exhausting shared resources.

Does AddStandardResilienceHandler include bulkhead isolation in ASP.NET Core? AddStandardResilienceHandler includes retry, circuit breaker, timeout, and rate limiter strategies by default, but does not include a concurrency limiter (bulkhead) out of the box. Bulkhead slot sizing is inherently application-specific โ€” there is no meaningful default. Add an explicit AddConcurrencyLimiter call within AddResilienceHandler when you need per-dependency isolation, and size the values based on your actual concurrency profile for that dependency.

How do I monitor bulkhead rejections in ASP.NET Core with OpenTelemetry? With Microsoft.Extensions.Resilience.Telemetry configured and OpenTelemetry instrumented, Polly automatically emits metrics under the resilience.polly namespace. The key counter to watch is the concurrency limiter rejection count โ€” a spike indicates that requests are being turned away because all bulkhead slots and the queue are full. Pair this metric with a queueWaitDuration histogram to distinguish between a dependency that is slow (high wait time, low rejections) versus one that is catastrophically degraded (immediate rejections).

Can I use the bulkhead pattern with background services in ASP.NET Core? Yes. The concurrency limiter strategy in Polly v8 is not limited to IHttpClientFactory. You can create a ResiliencePipeline directly and wrap any async operation โ€” including BackgroundService tasks that call external APIs or queues. The same sizing principles apply: determine the maximum acceptable concurrent operations for that dependency and set permitLimit accordingly, with a conservative queueLimit to prevent queue buildup in background processing paths.

More from this blog

C

Coding Droplets

198 posts

Coding Droplets is your go-to resource for .NET and ASP.NET Core development. Whether you're just starting out or building production systems, you'll find practical guides, real-world patterns, and clear explanations that actually make sense.

From beginner-friendly tutorials to advanced architecture decisions. We publish fresh .NET content every day to help you grow at every stage of your career.