How to configure concurrency in .NET Core Web API?

How to configure concurrency in .NET Core Web API? - concurrency

In the old WCF days, you had control over service concurrency via MaxConcurrentCalls setting. MaxConcurrentCalls defaulted to 16 concurrent calls but you could raise or lower that value based upon your needs.
How do you control server side concurrency in .NET Core Web API? We probably need to limit it in our case as too many concurrent requests can impede overall server performance.

ASP.NET Core application concurrency is handled by its web server. For example:
Kestrel
var host = new WebHostBuilder()
.UseKestrel(options => options.ThreadCount = 8)
It is not recommended to set Kestrel thread count to a large value like 1K due to Kestrel async-based implementation.
More info: Is Kestrel using a single thread for processing requests like Node.js?
New Limits property has been introduced in ASP.NET Core 2.0 Preview 2.
You can now add limits for the following:
Maximum Client Connections
Maximum Request Body Size
Maximum Request Body Data Rate
For example:
.UseKestrel(options =>
{
options.Limits.MaxConcurrentConnections = 100;
}
IIS
When Kestrel runs behind a reverse proxy you could tune the proxy itself. For example, you could configure IIS application pool in web.config or in aspnet.config:
<configuration>
<system.web>
<applicationPool
maxConcurrentRequestsPerCPU="5000"
maxConcurrentThreadsPerCPU="0"
requestQueueLimit="5000" />
</system.web>
</configuration>
Of course Nginx and Apache have their own concurrency settings.

Related

Limiting processing time of request for WCF or ASMX webservice

Let say I have a webservice (WCF and ASMX .net framework 4.8) which is hosted on IIS 10. Webservice has a method with this content:
public CustomerListResponse Get(CustomerListRequest request)
{
//sleep for 1 hour
System.Threading.Thread.Sleep(TimeSpan.FromHours(1));
return new CustomerListResponse();
}
The line that is performing sleep on thread is just to show that there is code that in some cases can take long time.
What I'm looking is setting or way to limit allowed processing time for example to one minute and error returned to client. I want the processing be killed by IIS/WCF/ASMX if the execution time will exceed one minute.
Unfortunately I didn't found a way in IIS for that. Also I don't have access to client code to set this limit on other side - change is possible only on server side.
What I tried:
on binding for WCF there is couple of properties openTimeout="00:01:00" closeTimeout="00:01:00" sendTimeout="00:01:00" receiveTimeout="00:01:00" - I set them all but it didn't work. Code can still process for long time.
<httpRuntime targetFramework="4.8" executionTimeout="60" /> - also didn't work
I don't have other ideas how to achieve that, but I believe there should be some solution to be able control how long we want to spend on processing.

You need to set the timeout on both client-side and server-side.
Client-side:
SendTimeout is used to initialize OperationTimeout, which manages the entire interaction of sending a message (including receiving a reply message in a request-reply case). This timeout also applies when a reply message is sent from the CallbackContract method.
OpenTimeout and CloseTimeout are used to open and close channels (when no explicit timeout value is passed).
ReceiveTimeout not used.
Server-side:
Send, open, and close timeouts are the same as on the client side (for callbacks).
ReceiveTimeout is used by the ServiceFramework layer to initialize idle session timeouts.

OpenCensus distributed tracing propagation using both Flask & requests

I'm attempting to implement distributed tracing via OpenCensus across different services where each service is using Flask for serving and sends downstream requests using... er.. requests to build its reply. The tracing platform is GCP Cloud Trace.
I'm using FlaskMiddleware, which traces the initial call correctly, but there's no propagation of span info between the source and target service, even when the middleware has a propagation defined (I've tried a few):
#middleware = FlaskMiddleware(app, exporter=exporter, sampler=sampler, excludelist_paths=['healthz', 'metrics'], propagator=google_cloud_format.GoogleCloudFormatPropagator())
middleware = FlaskMiddleware(app, exporter=exporter, sampler=sampler, excludelist_paths=['healthz', 'metrics'], propagator=b3_format.B3FormatPropagator())
#middleware = FlaskMiddleware(app, exporter=exporter, sampler=sampler, excludelist_paths=['healthz', 'metrics'], propagator=trace_context_http_header_format.TraceContextPropagator())
I guess the question is, does anyone have an example of setting up tracing that traverses several downstream calls when each service is serving via Flask and making downstream requests via requests.
Currently I end up with each service having its own trace with a single span.

Google App Engine (GAE) basic scaling backend instance serves one request and undeploys

I have deployed an application (frontend and backend) in App Engine. First of all, I am using the free tier and I chose the default F1 for the frontend and B2 for the backend. I don't exactly understand the difference between B and F instances but based on their names, I chose them for backend and frontend respectively.
My backend is a Flask application that reads some data from Firestore on #app.before_first_request and "pre-caches" it for all future requests. This takes about 20-30 seconds before the first request is served so I really don't want the backend instance to become undeployed all the time.
Right now, my backend successfully serves one request (that I am making from the browser) and then immediately gets undeployed (basically I see no active instances in App Engine dashboard after the request is served). This means that every request once again has the same long delay upon server start that I don't want. I am not sure why this is happening because I've set idle timeout to 5 minutes. I know it is not a problem with my Flask application because it does not crash after a request on a local machine and I've done its memory profiling which is in bounds of B2 limits. This is my app.yaml for the backend:
runtime: python38
service: api
env_variables:
PORT: 8080
instance_class: B2
basic_scaling:
max_instances: 1
idle_timeout: 5m
Any insight would be appreciated!

Based on the information and behavior that you are exposing, please allow me to explain to you that both Scaling models are behaving as they are designed to do so.
“Automatic Scaling: It creates instances based on request rate, response latencies, and other application metrics. You can specify thresholds for each of these metrics, and a minimum number instances to keep running always.
Basic Scaling: Basic scaling creates instances only when your application receives requests. Each instance will be shut down when the application becomes idle. Basic scaling is ideal for work that is intermittent or driven by user activity.”
Use the following URL’s documentation as reference for those models and more of them How Instances are Managed.
Information added on 10/12/2021:
Hi,
I think the correct term is “shutdown” instead of “undeployed” Disabling your application. Looking at Instance States "an instance of a manual or basic scaled service can be either running or stopped. All instances of the same service and version share the same state." then looking at Scaling types "Basic scaling creates instances when your application receives requests. Each instance will be shut down when the application becomes idle. Basic scaling is ideal for work that is intermittent or driven by user activity." and the table's Startup and shutdown row for basic scaling "Instances are created on demand to handle requests and automatically shut down when idle, based on the idle_timeout configuration parameter. An instance that is manually stopped has 30 seconds to finish handling requests before it is forcibly terminated." and Scaling down "You can specify a minimum number of idle instances. Setting an appropriate number of idle instances for your application based on request volume allows your application to serve every request with little latency".
Could you please verify:
that the instance was not manually halted?
that instance is becoming idle?
that there are no background threads?
if functionality is the same when setting the max_instances to 2
that there are no logs showcasing an instance shutdown
that they are reaching the version with the updated the idle_timeout set

HttpClient Akka timeout settings

I am trying to implement an HTTP client in my Akka app in order to consume a 3rd party API.
What I am trying to configure are the timeout and the number of retries in case of failure.
Is the below code the right approach to do it?
val timeoutSettings =
ConnectionPoolSettings(config).withIdleTimeout(10 minutes)
.withMaxConnections(3)
val responseFuture: Future[HttpResponse] =
Http().singleRequest(
HttpRequest(
uri = "https://api.com"
),
timeoutSettings
)

That is not the right approach (for below I refer to the settings via .conf file rather than the programmatic approach, but that should correspond easily).
idle-timeout corresponds to the
time after which an idle connection pool (without pending requests)
will automatically terminate itself
on the pool level, and on akka.http.client level to
The time after which an idle connection will be automatically closed.
So you'd rather want the connection-timeout setting.
And for the retries its the max-retries setting.
The max-connections setting is only:
The maximum number of parallel connections that a connection pool to a
single host endpoint is allowed to establish
See the official documentation

Why respond time is 100 times slower than processing request on server?

I have a computer engine server in us-east1-b zone.
n1-highmem-4 (4 vCPUs, 26 GB memory) with 50 GB SSD and everything shows normal in monitoring graphs.
we are using this server as rails based RESTful API.
The problem is when we send a request to the server it takes very long time to receive the response.
Here is our server log:
as you can see it took 00:01 second to response to the request
and here is the response received by postman:
as you can see X-Runtime is 0.036319 as expected but we received the response in 50374 ms which means almost 1 min after server response!

I hope this answer can help people with same problem.
Passenger's highly optimized load balancer assumes that Ruby apps can handle 1 (or thread-limited amount of) concurrent connection(s). This is usually the case and results in optimal load-balancing. But endpoints that deal with SSE/Websockets can handle many more concurrent connections, so the assumption leads to degraded performance.
You can use the force max concurrent requests per process configuration option to override this. The example below shows how to set the concurrency to unlimited for /special_websocket_endpoint:
server {
listen 80;
server_name www.example.com;
root /webapps/my_app/public;
passenger_enabled on;
# Use default concurrency for the app. But for the endpoint
# /special_websocket_endpoint, force a different concurrency.
location /special_websocket_endpoint {
passenger_app_group_name foo_websocket;
passenger_force_max_concurrent_requests_per_process 0;
}
}
In Passenger 5.0.21 and below the option above was not available yet. In those versions there is a workaround for Ruby apps. Enter code below into config.ru to set the concurrency (on the entire app).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js