The ASP.NET Core API Performance Checklist for .NET Teams (2026)

When an API slows down under load, the investigation typically leads to the same short list of culprits: unnecessary allocations, blocking calls, missing caching, oversized payloads, and uncontrolled concurrency. The problem is rarely a single bottleneck โ it is the compounding effect of a dozen small decisions that each seemed harmless at the time.
This checklist brings together the performance controls that enterprise .NET teams most often neglect or defer. Each item is something you can verify, measure, and fix without a full architectural rewrite. If you want to work through these patterns inside a real production API โ with all the plumbing already wired together โ the full production codebase is on Patreon, where members get annotated source code mapped to the exact patterns covered here.
Understanding individual performance controls is useful. Seeing them work together inside a complete ASP.NET Core API โ alongside authentication, error handling, validation, and CI/CD โ is what makes each decision click. That full picture is what Chapter 9, 10, and 14 of the Zero to Production course cover, with a running codebase you can open and explore immediately.
Why a Performance Checklist?
Performance problems in ASP.NET Core APIs are not usually caused by the framework. They are caused by how the framework is used. Most teams inherit an API that works fine at low traffic and then discover its limits the hard way when load increases. A checklist gives teams a structured way to audit their API against known patterns before problems appear โ not after.
This checklist is organised by category, from the lowest-effort wins to the deeper architectural decisions. Work through it in order: fix the quick items first, then invest in the structural ones.
1. Async All the Way Down
Every I/O-bound operation in your API should use async/await from the controller action down to the data access layer. Synchronous blocking calls โ .Result, .Wait(), or Task.GetAwaiter().GetResult() โ tie up thread pool threads and cause ThreadPool starvation under load. This is one of the most common and damaging performance issues in production ASP.NET Core APIs.
What to check:
No
.Resultor.Wait()calls anywhere in the request pathAll database calls, HTTP client calls, and file I/O use
awaitControllers use
async Task<IActionResult>(orasync Task<ActionResult<T>>) โ not synchronous return typesAll methods accept and forward a
CancellationTokenfrom the controller action to the data layer
If you have seen your API slow down under moderate load with no clear CPU or memory pressure, ThreadPool starvation is the likely cause. That article has the diagnostic steps and fix.
2. Use Output Caching for Expensive Read Endpoints
Output caching in ASP.NET Core stores the complete serialised HTTP response and replays it for matching requests, bypassing your entire request pipeline โ middleware, controller, validation, and data access. For read-heavy endpoints that return the same data to many callers, this is the highest-leverage performance control available.
What to check:
AddOutputCache()is registered andUseOutputCache()is in the middleware pipeline (beforeMapControllers())Named cache policies define expiry, vary-by-query, and vary-by-header rules explicitly โ not defaulting to a single global policy
Tag-based cache eviction is wired up so write operations invalidate the relevant cached responses
Output cache is not applied to endpoints that return user-specific data (without correct vary-by-user partitioning)
The GitHub repo at codingdroplets/dotnet-output-caching-api demonstrates named policies, tag-based eviction, and vary-by strategies in a working API if you want to see the full setup.
3. Choose the Right Caching Layer
Not every cache needs to be a distributed cache. Choosing the wrong caching abstraction adds latency and complexity without benefit.
| Cache type | When to use |
|---|---|
IMemoryCache |
Single-instance APIs, frequently read data, low cardinality |
IDistributedCache (Redis) |
Multi-instance deployments, shared cache state required |
HybridCache (.NET 9+) |
Multi-instance with local L1 fallback + stampede protection |
| Output cache | Full response caching for anonymous or shared read endpoints |
What to check:
Multi-instance deployments are not using
IMemoryCacheas the primary cache for shared stateHybridCacheis considered for workloads that benefit from local L1 + distributed L2 combined (available in .NET 9+)Cache expiry policies use
AbsoluteExpirationfor correctness andSlidingExpirationonly where stale data is acceptable
Reviewing the 7 Common ASP.NET Core Caching Mistakes article is a practical companion to this section โ it covers the specific patterns that cause cache-related regressions in production.
4. Apply Rate Limiting to All External-Facing Endpoints
Rate limiting does double duty: it protects your API from abuse and it prevents poorly-behaved clients from degrading service quality for all other consumers. In ASP.NET Core 7 and later, the built-in AddRateLimiter middleware gives you four algorithm options โ Fixed Window, Sliding Window, Token Bucket, and Concurrency โ without any third-party dependency.
What to check:
A global rate limiter is applied as a safety net for all traffic
Per-user or per-IP partitioned policies are in place for authenticated endpoints
OnRejectedreturns a proper429 Too Many Requestswith aRetry-Afterheader and a Problem Details body โ not a raw status code[DisableRateLimiting]is used deliberately on internal health and diagnostic endpoints, not forgotten on production endpoints
The full working implementation of all four rate limiting algorithms with per-user partitioning is available at codingdroplets/dotnet-rate-limiting-api.
5. Reduce Response Payload Size
Smaller responses reach the client faster and consume less bandwidth. Most APIs send more data than necessary because they map domain entities directly to response objects without a dedicated DTO layer.
What to check:
Response DTOs include only the fields the client actually needs โ not full entity graphs
JSON serialisation uses
System.Text.Json(the default since .NET Core 3.x) โ notNewtonsoft.Jsonunless you have a specific requirement[JsonIgnore]or custom serialisation options suppress null values, internal audit fields, and navigation properties from responsesResponse compression (
AddResponseCompression()) is enabled for APIs that serve large payloads over HTTP โ and is not applied to APIs served over HTTPS where BREACH-type attacks are a concernPagination is applied to all list endpoints โ no unbounded result sets
6. Avoid N+1 and Over-Fetching at the Data Layer
Database access is almost always the dominant source of latency in API workloads. The two most common problems are the N+1 query (executing one query per item in a collection) and over-fetching (loading columns or rows that the response never uses).
What to check:
SQL query logging is enabled in development so N+1 patterns surface before reaching production
AsNoTracking()is applied to all read queries that do not modify the retrieved entitiesSelect()projections are used to load only the columns the response DTO needsCountAsync()is executed beforeSkip()/Take()for paginated endpoints โ not loading the full dataset to count itFindAsync()is used for single-entity-by-primary-key lookups โ notFirstOrDefaultAsync()with a LINQ predicate that bypasses the EF Core identity cache
The EF Core Performance Tuning Checklist for High-Traffic APIs goes deeper on EF Core-specific patterns if your bottleneck is at the ORM layer.
7. Minimise Middleware Pipeline Overhead
Every middleware component in the pipeline runs on every request, including requests that do not need it. Unnecessary middleware adds latency that compounds at scale.
What to check:
Middleware order follows the correct ASP.NET Core pipeline sequence: exception handling first, then HTTPS redirect, static files, routing, authentication, authorisation, rate limiting, then endpoint execution
Middleware that is only relevant for specific routes (e.g., request body logging for audit) is applied with
UseWhen()rather than globallyUseDeveloperExceptionPage()and diagnostic middleware are never enabled in production (app.Environment.IsDevelopment()guard is in place)Unused middleware registrations from earlier development iterations have been removed
8. Configure HttpClient Correctly for Downstream Calls
HttpClient misuse is a classic source of socket exhaustion and DNS resolution failures in production APIs. The two most common mistakes are instantiating HttpClient directly inside request handlers and using static clients that do not refresh DNS.
What to check:
All
HttpClientinstances are registered and resolved throughIHttpClientFactoryโ never instantiated withnew HttpClient()Named or typed clients are used when downstream services have different timeout, retry, or authentication requirements
Polly resilience pipelines (
AddStandardResilienceHandler()) are applied for retry with exponential backoff, circuit breaker, and timeout โ not manual retry loopsHttpClientbase addresses and timeouts are configured at registration time, not set per-request
9. Enable HTTP/2 and Response Compression at the Transport Layer
HTTP/2 reduces the overhead of multiple concurrent requests to the same API through header compression and multiplexing. ASP.NET Core supports HTTP/2 on Kestrel out of the box; the most common reason it is not active is that it was never explicitly enabled.
What to check:
Kestrel is configured to allow HTTP/2 (the default in .NET 6+ when using HTTPS, but worth verifying for your hosting setup)
UseHttpsRedirection()is in the pipeline โ HTTP/2 only works over TLS in browsers and most clientsResponse compression is configured for text-based content types (
text/html,application/json,text/plain) with Brotli preferred and Gzip as fallback
10. Profile Allocations, Not Just Latency
High allocation rates cause frequent garbage collection pauses, which are invisible to latency-only profiling until they become severe. In high-throughput APIs, allocation pressure from string formatting, LINQ in hot paths, and large DTO graphs can cause GC pauses that spike p99 latency without raising average response time significantly.
What to check:
OpenTelemetry metrics are capturing GC pause time and heap allocation rate alongside HTTP request latency โ not just request count and average duration
string.Format()in high-frequency code paths has been replaced withstring.Concat()or interpolated strings withValueStringBuilderwhere allocation mattersLINQ queries in request-path code that materialise large intermediate collections have been reviewed for allocation cost
Memory usage under sustained load has been observed โ not just under burst traffic โ to catch slow allocation growth patterns
For a complete observability setup that captures these signals, the OpenTelemetry in ASP.NET Core guide covers traces, metrics, and logs with the full configuration wired together.
11. Set Explicit Timeouts at Every Boundary
Unbounded operations are one of the most common causes of resource exhaustion in production APIs. A slow downstream call that has no timeout does not fail cleanly โ it holds a thread pool thread, an active HTTP connection, and potentially a database connection for the duration of the hang.
What to check:
Every downstream HTTP call has an explicit timeout via
HttpClientconfiguration or PollyTimeoutPolicyDatabase command timeouts are set on
DbContextโ not relying on connection pool defaultsLong-running background operations use
CancellationTokenchained to aCancellationTokenSourcewith a timeoutThe ASP.NET Core
RequestTimeoutsmiddleware (available in .NET 8+) is configured for endpoints where unbounded request processing is a risk
12. Measure First, Optimise Second
No performance checklist replaces measurement. The items above cover the patterns that cause problems in most ASP.NET Core APIs โ but your specific bottleneck may be elsewhere. Apply the checklist, then verify with data.
What to check:
Benchmarks exist for the API's critical paths using realistic data volumes โ not test fixtures
Load testing is run against a staging environment that mirrors production resource constraints โ not localhost
A performance baseline exists so regressions can be detected before they reach production
The OpenTelemetry integration emits custom metrics for domain-specific operations (queue depth, cache hit rate, downstream dependency latency) โ not only framework-generated signals
โ Prefer a one-time tip? Buy us a coffee โ every bit helps keep the content coming!
FAQ
What is the most common ASP.NET Core API performance bottleneck? Synchronous blocking calls (.Result, .Wait()) in the request path are the single most common cause of performance degradation in production ASP.NET Core APIs. They exhaust thread pool threads and cause latency spikes that worsen non-linearly as traffic increases. Audit the full request path for blocking calls before investigating anything else.
Should I use IDistributedCache or HybridCache in .NET 10? For single-instance deployments, IMemoryCache remains the simplest and fastest option. For multi-instance APIs that need shared cache state, use IDistributedCache with a Redis backing store. HybridCache (available in .NET 9+) is the preferred choice when you want both a local in-process L1 cache and a distributed L2 cache with stampede protection โ it replaces the manual layering of IMemoryCache and IDistributedCache that was common before .NET 9.
Does output caching work with authenticated endpoints? Yes, but it requires explicit vary-by-user partitioning. If you apply output caching to an authenticated endpoint without partitioning the cache by user identity, responses from one user's request will be served to another. Use VaryByRouteValue, VaryByHeader, or a custom IOutputCachePolicy to partition by authenticated user when caching user-specific responses. For purely anonymous shared data (product catalogues, reference data, lookup tables), output caching works directly without partitioning.
How should I configure rate limiting for a multi-tenant ASP.NET Core API? Use partitioned rate limiting keyed on the authenticated user or tenant identifier โ not just the client IP address. IP-based partitioning is unreliable for APIs behind proxies or shared egress IPs. The AddRateLimiter middleware supports GetUserId partition key extraction from HttpContext claims, which makes per-tenant limits straightforward to implement.
What is the right way to use HttpClient in ASP.NET Core? Always use IHttpClientFactory โ never new HttpClient(). Register named or typed clients in Program.cs with builder.Services.AddHttpClient(). Apply Polly resilience pipelines via AddStandardResilienceHandler() for automatic retry, circuit breaker, and timeout. The factory manages handler lifetimes internally, which prevents both the socket exhaustion of short-lived clients and the DNS staleness of long-lived static clients.
How do I detect N+1 queries in an ASP.NET Core API? Enable SQL query logging in development by setting the EF Core log level to Information or Debug in appsettings.Development.json. Each database query will be logged with its SQL text. Load a representative list endpoint and count the queries โ one query per item in the result set is the N+1 pattern. Fix it by adding .Include() for navigation properties or by rewriting the query as a single projected Select() with a join.
What metrics should I monitor for ASP.NET Core API performance? At minimum: HTTP request duration (p50, p95, p99), request rate, error rate (4xx and 5xx separately), GC pause time, heap allocation rate, active thread pool threads, and downstream dependency latency. OpenTelemetry with the OpenTelemetry.Instrumentation.AspNetCore and OpenTelemetry.Instrumentation.Runtime packages provides most of these automatically with no custom instrumentation required.
Coding Droplets publishes decision-oriented content for .NET engineers building production systems. Follow on YouTube for video walkthroughs, or visit codingdroplets.com for the full article archive.





