Skip to main content

Command Palette

Search for a command to run...

Cache Stampede in ASP.NET Core: IMemoryCache Race Conditions in Production โ€” Root Cause and Fix

Updated
โ€ข12 min read
Cache Stampede in ASP.NET Core: IMemoryCache Race Conditions in Production โ€” Root Cause and Fix

Cache stampedes are one of those production problems that look like an infrastructure failure until you dig into the code. Under normal load, your ASP.NET Core API responds fast, the cache does its job, and everything is fine. Then traffic spikes โ€” or a scheduled cache expiry fires at the wrong time โ€” and your database is suddenly receiving ten times the expected queries in a fraction of a second. Response times climb, CPU usage spikes, and the on-call alert goes off. The cache was supposed to prevent this.

Understanding why IMemoryCache does not protect you from this by default, and knowing the exact fix to apply, is the difference between a cache that helps in production and one that creates a new category of failure. The full working implementation โ€” including the SemaphoreSlim locking pattern, the HybridCache migration, and a load test harness you can run against your own API โ€” is on Patreon, with production-ready source code that maps directly to real enterprise workloads.

If you want to see this problem and its fix in context of a complete production API โ€” alongside rate limiting, EF Core, and authentication all wired together โ€” Chapter 9 of the ASP.NET Core Web API: Zero to Production course covers exactly this, including how caching decisions interact with resilience and database access patterns.

ASP.NET Core Web API: Zero to Production

What Is a Cache Stampede?

A cache stampede โ€” also called a thundering herd or cache miss avalanche โ€” occurs when multiple concurrent requests all find that a cached value has expired at the same time. Because the value is gone, every one of those requests independently concludes that it needs to regenerate it. They all go to the data source simultaneously. The data source (your database, your downstream API) receives a burst of identical queries that far exceeds what the cache was designed to absorb.

The pattern repeats on every expiry cycle. Instead of one database call every five minutes, you get two hundred simultaneous calls every five minutes, each of which takes longer than usual because the database is now under abnormal load.

In ASP.NET Core, this happens specifically because IMemoryCache.GetOrCreateAsync does not serialize concurrent population. When ten requests call it for the same missing key at the same instant, all ten enter the factory delegate. The cache is not wrong โ€” it is doing exactly what it was designed to do. The problem is the assumption that only one caller will ever enter the factory for a given key at a given time.

Why It Happens in Production but Not Locally

Local testing rarely reveals this problem because developers typically test with low concurrency. One request runs, populates the cache, and the next request hits the populated entry. The race condition requires multiple concurrent callers and an expired (or absent) cache entry to trigger.

In production, three things combine to make it visible:

Traffic shape. High-traffic APIs receive tens or hundreds of requests per second. When a popular cache entry expires, many of those in-flight requests will call GetOrCreateAsync before any of them finishes populating it.

Cache TTL alignment. When all entries for a given category share the same TTL, they expire at the same wall-clock time. A cache warm-up at startup, followed by a fixed absolute expiration, produces synchronized expiry across the cluster.

Slow factory delegates. The longer the factory takes to run โ€” a complex query, a downstream API call, an EF Core join โ€” the wider the window during which concurrent callers can pile in. A factory that takes 200ms under normal load can collect 40 concurrent callers before the first one finishes.

How to Diagnose a Cache Stampede

Before reaching for a fix, confirm you are actually dealing with a stampede rather than a different caching problem.

What to look for in your metrics and logs:

  • A periodic spike in database query count that coincides with your cache TTL interval

  • Response latency spikes that are narrow (seconds, not minutes) and repeat at a predictable interval

  • Structured log entries showing the same cache key being "populated" by multiple concurrent requests within milliseconds of each other

  • Database CPU spikes that are correlated with specific cache keys rather than random query patterns

A quick diagnostic approach: add structured logging inside your factory delegate that includes the cache key and a timestamp. If you see the same key logged multiple times within a single second, you have confirmed a stampede. With Serilog and named placeholders, a single log line is sufficient: _logger.LogInformation("Cache miss for key {CacheKey} at {UtcNow}", key, DateTime.UtcNow).

If you are already using IDistributedCache, you may not notice the problem at first because the distributed cache serializes writes differently โ€” but the same fundamental issue can occur if your factory is slow enough and traffic is high enough.

Fix 1: SemaphoreSlim Per-Key Locking

The most widely used fix for IMemoryCache is a SemaphoreSlim that gates entry into the factory delegate. Only one caller is allowed to populate a given key. All other callers wait, and once the first caller finishes and populates the cache, the waiting callers retrieve the value without calling the factory themselves.

The implementation pattern wraps GetOrCreateAsync inside a lightweight keyed lock. The lock is typically stored in a ConcurrentDictionary<string, SemaphoreSlim> so that different keys can proceed concurrently โ€” only callers requesting the same key are serialized.

This approach is effective and works without any dependency changes. The trade-off is added complexity: the lock dictionary must be managed carefully to avoid memory leaks (remove entries after use), and the per-key semaphore adds a small allocation cost on cache miss.

The pattern is most appropriate when:

  • You are on .NET 8 or earlier and cannot yet adopt HybridCache

  • Your cache usage is concentrated on a small, known set of high-contention keys

  • You want precise control over the locking behaviour

Fix 2: Staggered TTL to Break Synchronized Expiry

A simpler partial mitigation is to add a random jitter to the cache TTL. Instead of all entries for a category expiring at exactly the same time, they expire within a window โ€” say, between 4 and 6 minutes for a 5-minute target TTL. This distributes the stampede load over time rather than eliminating it entirely.

Jitter is easy to add: TimeSpan.FromMinutes(5) + TimeSpan.FromSeconds(Random.Shared.Next(0, 60)). The Random.Shared instance in .NET 6 and later is thread-safe.

Staggered TTL does not eliminate the problem โ€” it reduces its impact. Under very high concurrency, even a 60-second jitter window will still produce small stampedes. It is best used as a complementary technique alongside one of the serialization approaches.

Fix 3: HybridCache โ€” Stampede Protection Built In

The cleanest long-term fix for .NET 9 and .NET 10 projects is HybridCache, which was introduced as GA in .NET 9. HybridCache has stampede protection built into its design: it serializes concurrent requests for the same key using a coalescing mechanism. When multiple requests arrive for a missing key simultaneously, only one factory invocation is triggered. The other callers wait for that single result and share it.

This is a first-class guarantee in the API contract, not a workaround. Switching from IMemoryCache.GetOrCreateAsync to HybridCache.GetOrCreateAsync with the same factory delegate gives you stampede protection without any additional locking code.

HybridCache also provides an optional two-layer architecture: a fast in-process L1 cache backed by a distributed L2 (Redis, for example). For multi-instance APIs, the L1+L2 combination means that even after a pod restart, the newly started instance can warm from the shared L2 rather than hitting the database. This directly reduces the window during which a stampede can form.

When to use HybridCache vs the SemaphoreSlim approach:

  • New projects on .NET 9 or .NET 10: default to HybridCache

  • Existing projects still on .NET 8: the SemaphoreSlim pattern is the right in-place fix

  • Multi-instance deployments where L2 (Redis) is already in place: HybridCache provides the most complete solution

Is There a Question About Which Cache to Use?

If the stampede is severe enough that you are investigating this as a production incident, the answer for most teams is: migrate the affected endpoints to HybridCache if you are on .NET 9+. The API surface is close enough to IMemoryCache that a targeted migration of just the high-traffic keys is low-risk and can be done without touching the rest of the codebase.

If you are on .NET 8 or need a fix today without a version upgrade, the SemaphoreSlim pattern is production-proven and straightforward. The risk is that you are now maintaining custom locking infrastructure. Keep it simple: one semaphore per key, clean up after use, and log when the lock is contended.

The one pattern to avoid entirely is using a single global lock across all keys. A wide lock serializes all cache operations and eliminates the concurrency benefits of caching. The whole point of the fix is to serialize per-key, not per-cache.

For a full production-ready implementation โ€” including how the lock dictionary is managed, how the HybridCache migration looks in a real API with Redis as L2, and load test results showing the before and after โ€” the complete source code is at github.com/codingdroplets/dotnet-hybridcache-aspnetcore.

Preventing It From Recurring

Once the immediate fix is in place, the following practices reduce the risk of encountering this again:

Add expiry jitter by default. Make random TTL offset a team convention rather than a case-by-case addition. A helper extension method that wraps IMemoryCache.Set with built-in jitter ensures it is never forgotten.

Monitor per-key cache miss rate. Most observability platforms (Application Insights, Grafana, Seq) can be configured to alert when a specific cache key experiences a sustained elevated miss rate. A miss rate spike is an early warning of a developing stampede.

Use HybridCache for all new high-traffic cache points. Retrofit is cheap โ€” the API is similar to IMemoryCache. There is no reason to write new stampede-vulnerable code on .NET 9+.

Review factory delegate latency. The faster the factory, the narrower the stampede window. Queries inside factory delegates should use AsNoTracking(), return only the columns needed, and have appropriate indexes. A 10ms factory has a much smaller blast radius than a 300ms one.

Load test expiry behaviour explicitly. Add a load test scenario that fires a burst of concurrent requests at a key that has just expired. This is the exact condition that triggers a stampede and it is trivially reproducible under controlled conditions. Catch it before production does.

โ˜• Found this useful? Buy us a coffee โ€” every bit helps keep the content coming!

FAQ

Does IMemoryCache.GetOrCreate (non-async) have the same stampede problem? Yes. The synchronous GetOrCreate method has the same race condition: multiple callers can enter the factory delegate concurrently for a missing key. The same SemaphoreSlim pattern applies, using SemaphoreSlim.Wait and Release instead of the async equivalents. For new code, prefer the async path to avoid blocking ThreadPool threads.

Will switching to Redis distributed cache fix the stampede? Not automatically. IDistributedCache has the same absence of per-key serialization as IMemoryCache. Multiple instances of your API can still race to populate the same key in Redis simultaneously. The fix โ€” SemaphoreSlim or HybridCache โ€” is needed regardless of which backing store you use. HybridCache with Redis as L2 does protect against this, because stampede coalescing happens at the HybridCache layer.

How many concurrent requests does it take to trigger a stampede? There is no fixed threshold. A stampede can form with as few as two concurrent callers if the factory delegate is slow enough. In practice, stampedes become operationally significant when the factory takes more than 50ms and more than 10 concurrent callers are hitting the same key simultaneously. High-traffic endpoints with expensive factories are the highest risk.

Can output caching be used instead? Output caching caches the full HTTP response at the middleware layer rather than at the service layer. It has stampede protection built in and is a valid alternative for read-only endpoints that return the same response to all callers. It is not a substitute for IMemoryCache or HybridCache in scenarios where caching is used at the service or repository level, or where the cached value feeds into further processing before the response is formed.

Is HybridCache's coalescing guarantee cluster-wide or per-instance? Per-instance. The stampede protection in HybridCache prevents multiple concurrent requests on the same instance from all executing the factory simultaneously. It does not prevent two different pods in a Kubernetes cluster from both executing the factory at the same time for the same key. For cluster-wide coalescing, a distributed lock (Redis SET NX or a similar primitive) is required. In practice, per-instance protection is sufficient for most workloads โ€” the factory executes once per instance, not once per request.

Should I use AbsoluteExpiration or SlidingExpiration to reduce stampede risk? Absolute expiration with jitter is generally safer. Sliding expiration extends the TTL on every access, which means that for popular keys, the entry may never expire cleanly โ€” but when it eventually does expire (because traffic drops overnight, for example), all clients that start hitting it in the morning will find a cold cache simultaneously. Absolute expiration with a small random jitter gives more predictable expiry distribution and makes the stampede window easier to reason about.

Does the ConcurrentDictionary used for the semaphore lock introduce its own thread safety issues? No. ConcurrentDictionary<TKey, SemaphoreSlim> is thread-safe for concurrent reads and writes. The standard pattern โ€” GetOrAdd to retrieve or create the semaphore, WaitAsync to acquire it, Release to free it, and then removal of the entry after the factory completes โ€” is safe under concurrent access. The entry-removal step requires care to avoid a race where a newly added entry is removed before another caller has a chance to acquire it; the typical fix is to check the cache again after acquiring the semaphore before executing the factory.

More from this blog

C

Coding Droplets

223 posts

Coding Droplets is your go-to resource for .NET and ASP.NET Core development. Whether you're just starting out or building production systems, you'll find practical guides, real-world patterns, and clear explanations that actually make sense.

From beginner-friendly tutorials to advanced architecture decisions. We publish fresh .NET content every day to help you grow at every stage of your career.