ring jetty adapter - limit concurrent requests

ring jetty adapter - limit concurrent requests - clojure

Question:
How do you configure ring-jetty-adapter to limit the number of concurrent worker threads? I'm using embedded jetty here, not creating a WAR file or anything like that.
Context
I only have 20 connections in my database connection pool, and all requests need to do database queries. Currently, when the server load gets to high, say 40 concurrent requests continually, 20 of them will be blocked waiting for the DB. Then the queue will keep building up, and the wait will just spike out of control (Thread starvation).
The :max-threads parameter does not do what I want, as it only limits the size of jettys internal thread pool, which is used for accept and selector threads, not just worker threads.
After some research I think what I need to use is the jetty QoS filter but I can't figure out how to translate a web.xml configuration to my clojure ring app.

First, you cannot control or limit behavior by adjusting threading configuration.
The old school model of 1 thread per request is not valid on modern containers, especially so on Jetty, which is 100% async internally.
A single request can use 1...n threads through that request's lifetime. The behavior of threading in Jetty is influenced by your technology choices (eg: os, jvm, network protocols, etc), and API choices, and even how stressed your server is.
With that out of the way, your desired solution should instead focus on limiting the number of Requests that can be used by a specific server resource endpoint concurrently.
The way that's done is by limiting the number of active requests that can concurrently access a server resource endpoint.
This is accomplished by tracking the number of requests to that specific resource endpoint and then suspending requests that exceed a configured maximum, resuming suspended requests when the active count falls below the configured maximum, and also timing out requests that sit in the suspended state for too long.
This feature set is provided for you in the Jetty QoSFilter.
You can use the Jetty QoSFilter for anything in Jetty that is based on the Jetty ServletContextHandler (which includes the WebAppContext).
See: https://www.eclipse.org/jetty/documentation/jetty-9/index.html#qos-filter
FilterHolder qosConfig = servletContextHandler.addFilter(QoSFilter.class,
"/api/backend/*",
EnumSet.of(DispatcherType.REQUEST));
qosConfig.setInitParameter("maxRequests", "10");
qosConfig.setInitParameter("waitMs", "50");
qosConfig.setInitParameter("suspendMs", "-1");

Related

appConcurrentRequest limit exceed on IIS

I deployed .Netcore MVC on AWS windows server 2019(32gb RAM and 8 cores). 100k concurrent requests because its an online exam application. 100k concurrent request should be entertained. Which server should I use?

The concurrent connection configuration depend on the maximum concurrent connection in site's advanced setting, queue length and maximum worker process in application pool advanced setting, maximum thread in thread pool.Besides, I notice the serverruntime/httpruntime has a limit of appconcurrentRequestLimit with 5000. So if you need to achieve the high concurrent request, you could go to
IIS manager->site->configuration editor->system.webServer/serverRuntime->appConcurrentRequest.

Play Framework - production configuration threads/db connections for AWS

We have a Play Framework application and we are running into production issues. I am curious what settings people have used on your production.conf to ensure that your web server does not slow down. We have a basic web application that communicates with Mysql, sends emails, text messages, and has a login portal.
If you're using an AWS EC2 instance (say a t2.large) how many threads to you set on your thread pool, how many connections on your database connection pool and what is your Xms and Xmx for memory set at?
Thank you.

If you're using an AWS EC2 instance (say a t2.large) how many threads to you set on your thread pool
I think this might one of the possible root cause of issues you face with. Having a single ThreadPool, or more precisely ExecutionContext which wraps some ExecutorService (e.g. FixedThreadPoolExecutorService) is considered as bad practice, because it same EC instance inside of application can be used for blocking operations, like database access - for instance MySql in your case, which eventually will cause entire thread blocking, and performance issues in other app parts, like new connection handling or response rendering.
What you will need to do is to use different EC instances for blocking and non-blocking operations - let's call them frontend EC (for non blocking) and backend EC (for blocking).
frontend EC you need to use in view and service levels (API controllers, business logic etc.);
backend EC you you need to use in DAO application level or any possible blocking pars;
The way how resource should be distributed between those two EC's is really depends on the context. You can start with for instance 80% of threads for frontend EC and other 20% backend EC and then keep tuning, until you will receive desired performance result.
But in general, question you ask depends on a lot of context - there is not silver bullet as usually at engineering.
Hope this helps!

throttling http api calls with delay

I'm trying to implement some throttles on our REST API. A typical approach is after a certain threshold to block the request (with 403 or 429 response). However, I've seen one api that adds a delay to the response instead.
As you make calls to the API, we will be looking at your average calls per second (c/s) over the previous five-minute period. Here's what will happen:
over 3c/s and we add a 2 second delay
over 5c/s and we add a 4 second delay
over 7c/s and we add a 5 second delay
From the client's perspective, I see this being better than getting back an error. The worst that can happen is that you'll slow down.
I am wondering how this can be achieved without negatively impacting the app server. i.e. To add those delays, the server needs to keep the request open, causing it to keep more and more request processors busy, meaning it has less capacity for new requests coming in.
What's the best way to accomplish this? (i.e. is this something that can be done on the web server / load balancer so that the application server is not negatively affected? Is there some kind of a throttling layer that can be added for this purpose?)
We're using Django/Tastypie, but the question is more on the architecture/conceptual level.

If your are using synchronous application server which is the most common setup for Django applications (for example a gunicorn with default --worker-class sync), then adding such a delay in the application would indeed have a very bad impact on performance. A worker handling a delayed request would be blocked during a delay period.
But you can use asynchronous application server (for example gunicorn with '--worker-class gevent`) and then an overhead should be negligible. A worker that handles a delayed requests is able to handle other requests while a delay is in progress.
Doing this in the reverse proxy server may be a better option, because it allows to easily and flexibly adjust a policy. There is an external nginx module for exactly such thing.

Forcing asmx web service to handle requests one at a time

I am debugging an ASMX web service that receives "bursts" of requests. i.e., it is likely that the web service will receive 100 asynchronous requests within about 1 or 2 seconds. Each request seems to take about a second to process (this is expected and I'm OK with this performance). What is important however, is that each request is dealt with sequentially and no parallel processing takes places. I do not want any concurrent request processing due to the external components called by the web service. Is there any way I can force the web service to only handle each response sequentially?
I have seen the maxconnection attribute in the machine.config but this seems to only work for outbound connections, where as I wish to throttle the incoming connections.
Please note that refactoring into WCF is not an option at this point in time.
We are usinng IIS6 on Win2003.

What I've done in the past is to simply put a lock statement around any access to the external resource I was using. In my case, it was a piece of unmanaged code that claimed to be thread-safe, but which in fact would trash the C runtime library heap if accessed from more than one thread at a time.

Perhaps you should be queuing the requests up internally and processing them one by one?
It may cause the clients to poll for results (if they even need them), but you'd get the sequential pipeline you wanted...

In IIS7 you can set up a limit of connections allowed to a web site. Can you use IIS7?

why is the lift web framework scalable?

I want to know the technical reasons why the lift webframework has high performance and scalability? I know it uses scala, which has an actor library, but according to the install instructions it default configuration is with jetty. So does it use the actor library to scale?
Now is the scalability built right out of the box. Just add additional servers and nodes and it will automatically scale, is that how it works? Can it handle 500000+ concurrent connections with supporting servers.
I am trying to create a web services framework for the enterprise level, that can beat what is out there and is easy to scale, configurable, and maintainable. My definition of scaling is just adding more servers and you should be able to accommodate the extra load.
Thanks

Lift's approach to scalability is within a single machine. Scaling across machines is a larger, tougher topic. The short answer there is: Scala and Lift don't do anything to either help or hinder horizontal scaling.
As far as actors within a single machine, Lift achieves better scalability because a single instance can handle more concurrent requests than most other servers. To explain, I first have to point out the flaws in the classic thread-per-request handling model. Bear with me, this is going to require some explanation.
A typical framework uses a thread to service a page request. When the client connects, the framework assigns a thread out of a pool. That thread then does three things: it reads the request from a socket; it does some computation (potentially involving I/O to the database); and it sends a response out on the socket. At pretty much every step, the thread will end up blocking for some time. When reading the request, it can block while waiting for the network. When doing the computation, it can block on disk or network I/O. It can also block while waiting for the database. Finally, while sending the response, it can block if the client receives data slowly and TCP windows get filled up. Overall, the thread might spend 30 - 90% of it's time blocked. It spends 100% of its time, however, on that one request.
A JVM can only support so many threads before it really slows down. Thread scheduling, contention for shared-memory entities (like connection pools and monitors), and native OS limits all impose restrictions on how many threads a JVM can create.
Well, if the JVM is limited in its maximum number of threads, and the number of threads determines how many concurrent requests a server can handle, then the number of concurrent requests will be determined by the number of threads.
(There are other issues that can impose lower limits---GC thrashing, for example. Threads are a fundamental limiting factor, but not the only one!)
Lift decouples thread from requests. In Lift, a request does not tie up a thread. Rather, a thread does an action (like reading the request), then sends a message to an actor. Actors are an important part of the story, because they are scheduled via "lightweight" threads. A pool of threads gets used to process messages within actors. It's important to avoid blocking operations inside of actors, so these threads get returned to the pool rapidly. (Note that this pool isn't visible to the application, it's part of Scala's support for actors.) A request that's currently blocked on database or disk I/O, for example, doesn't keep a request-handling thread occupied. The request handling thread is available, almost immediately, to receive more connections.
This method for decoupling requests from threads allows a Lift server to have many more concurrent requests than a thread-per-request server. (I'd also like to point out that the Grizzly library supports a similar approach without actors.) More concurrent requests means that a single Lift server can support more users than a regular Java EE server.

at mtnyguard
"Scala and Lift don't do anything to either help or hinder horizontal scaling"
Ain't quite right. Lift is highly statefull framework. For example if a user requests a form, then he can only post the request to the same machine where the form came from, because the form processeing action is saved in the server state.
And this is actualy a thing which hinders scalability in a way, because this behaviour is inconistent to the shared nothing architecture.
No doubt that lift is highly performant but perfomance and scalability are two different things. So if you want to scale horizontaly with lift you have to define sticky sessions on the loadbalancer which will redirect a user during a session to the same machine.

Jetty maybe the point of entry, but the actor ends up servicing the request, I suggest having a look at the twitter-esque example, 'skitter' to see how you would be able to create a very scalable service. IIRC, this is one of the things that made the twitter people take notice.

I really like #dre's reply as he correctly states the statefulness of lift being a potential problem for horizontal scalability.
The problem -
Instead of me describing the whole thing again, check out the discussion (Not the content) on this post. http://javasmith.blogspot.com/2010/02/automagically-cluster-web-sessions-in.html
Solution would be as #dre said sticky session configuration on load balancer on the front and adding more instances. But since request handling in lift is done in thread + actor combination you can expect one instance handle more requests than normal frameworks. This would give an edge over having sticky sessions in other frameworks. i.e. Individual instance's capacity to process more may help you to scale
you have Akka lift integration which would be another advantage in this.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js