As far as I know, async apps with thread pool are used to not create
new thread per each connection. When this is solved with go blocks in core.async, where socket IO can be parked as it is possible with operating system and popular http libraries supporting it, other blocks like DB queries need to be handled with creating a new thread. For typical web application 99.9% requests are handled with accessing DB and we end up with n connections = n threads. Am I right and is there any solution?
Yes, you should not use go blocks for blocking operations like IO. Hopefully we'll have async jdbc soon https://www.slideshare.net/ypoirier/jdbc-next-a-new-asynchronous-api-for-connecting-to-a-database
Also there are https://github.com/alaisi/postgres.async
Related
I am developing a HTTP server using boost asio. So far, I have been using async operations (aync_read, async_write etc.), but I want to make my server concurrent, that is, the same as a server that creates a new thread per each new client connected.
I have read some forums etc. and, apparently, a concurrent server can be made only by using the mentioned async operations. I do not understand how is this possible.
I mean, taking into account that the async operations' handlers are executed in the thread that called to io_service.run(), lets take that a client is being responsed at this moment. How can another client make a petition and been answered while the main thread is busy with the first client?
The meaning of the word "concurrent" is ambiguous.
You are right, an asynchronous server is not concurrent at all. It can process only one request at a time. But the key insight is that what most servers do is actually they take a request, do some light processing (parsing, serialization, validation, some light business logic, etc.) and then call external resources (e.g. some database). The server can then process other requests while waiting for the external resource. So it's only an illusion of being concurrent (processing happens one after another but really fast). And it works as long as the processing is relatively fast compared to io.
If your server is supposed to do some hard cpu computations then obviously there will be no concurrency at all. In that case the only way to make it concurrent is to add threads or processes (possibly on multiple machines).
Asynchronous IO does not make the server concurrent.
In fact, Asynchronous IO does not mean "multi-threaded" or "multi-processed" at all. Node.js servers are mono-threaded and using asynchronous IO.
Asynchronous IO just means your thread does not wait for the IO to finish, but does other stuff meanwhile (like accepting and processing new incoming requests).
So no, the premise that Asynchronous IO makes the server concurrent is wrong. it does not make it concurrent, it makes it scalable, as thread-per-request is not so scalable, but a proper thread-pool + event queue/coroutines are. the threads only deal with CPU bound tasks and the event queue/coroutines manages enqueuing and dequeuing started/finished IO operations.
Not sure if you're only looking for a theoretical answer or a design example, but have you seen the HTTP Server 3 example for boost.asio?
Concurrency is achieved by having a small thread pool to execute the work. When callbacks need to be handled, all threads calling io_service.run() can be chosen to execute the task.
I have a boost asio application with many threads, similar to a web server, handling hundreds of concurrent requests. Every request will need to make calls to both memcached and redis (via libmemcached and redispp respectively). Is the best practice in this situation to make a separate connection to both redis and memcached from each thread (effectively tripling the open sockets on the server, three per request)? Or is there a way for me to build a static object, with a single memcached/redis connection, and allow all threads to share that single connection? I'm a bit confused when it comes to the thread safety of something like this, and everything needs to be asynchronous between the threads, but blocking for each thread's individual request (so each thread has a linear progression, but many threads can be in different places in their own progression at any given time). Does that make sense?
Thanks so much!
Since memcached have syncronous protocol you should not write next request before you got answer to prevous. So, no other thread can chat in same memcached connection. I'd prefer to make thread-local connection if you work with it in "blocking" mode.
Or you can make it work in "async" manner: make pool of connections, pick a connection from it (and lock it). After request is done, return it to pool.
Also, you can make a request queue and process it in special thread(s) (using multigets and callbacks).
I like to know the server (TCP based) architecture to support large scale of clients(at least10K) to implement Fix server. My points are
How we design it.
How to listen on the open port? Use select or poll or any other function.
How to process the response of the client? On large scale we cannot create the one thread for each client.
Should the processing of response is in the different executable and share the request and response to the server executable through IPC.
There is much more on it. I would appreciate if anyone explains it or provide any link.
Thanks
An excellent resource for information on this topic is The C10K problem. Although the dimensions there seem a little old, the techniques are still applicable today.
The architecture depends on what you want to do with the clients incoming data. My guess is that for every incoming message you would perform some computations and probably also return a response.
In that case I would create 1 main listener thread that receives all the incoming messages (Actually, if your hardware has more than 1 physical network device, I would use a listener thread per device and make sure each one is listening to a specific device).
Get the number of CPUs that you have on your machine and create worker threads for each CPU and bind them each thread to one cpu (Maybe number of working thread should be num_of_cpu-1, to leave an availalbe cpu for the listener and dispatcher).
Each thread has a queue and semaphore, the main listener thread just push the incoming data into those queues. There are many way to perform load balancing (Will talk about it later).
Each working thread just works on the requests given to it, and put the response on another queue that is read by the dispatcher.
The dispatcher - there are 2 options here, use a thread for dispatcher (or thread per network device as for listeners), or have the dispatcher actually be the same thread as the listener.
There is some advantage to put them both on the same thread, since it makes it easier to detect lost socket connection and use the same fds for both reading and writing without thread synchronization. However, it could be that using 2 different threads would give better performance, it need to be tested.
Note about load balancing:
This is a topic of its own.
The simplest thing is to use 1 queue for all working threads, but the problem is that they have to lock in order to pop items and the locking can damage performance. (But you get the most balanced load).
Another quite simple approach would be to have a private queue for every worker and perform round-robin when inserting. After every X cycles check the size of all the queues. If some queues are much larger than others then leave them out for the next X cycles and then recheck them again. This is not the best approach, but a simple one to implement and gives some load balancing while no locking is needed.
By the way - There is a way to implement queue between 2 threads without blocking - but this is also another topic.
I hope it helps,
Guy
If the client and server are on a secure network then the security aspect is to be minimal - to the extent that the transfers are encrypted. If the clients and the server are not on a secure network - you first want the server and client to authenticate each other and then initiate encrypted data transfer. For data transfer, server-side authentication should suffice. At the end of this authentication use the session key to generate encrypted data stream (symmetric). consider using TFTP it is simple to implement and scales reasonably well.
Is in multithreaded application suitable to have 1 connection per 1 connected client? To me it seems ineffective but if there is not connection pooling, how it could be done when one wants to let each connection communicate with DB?
Thanks
If you decide to share a connection amongst threads, you need to be sure that one thread completely finishes with the connection before another uses it (use a mutex, semaphore, or critical section to protect the connections). Alternately, you could write your own connection pool. This is not as hard as it sounds ... make 10 connections (or however big your pool needs to be) on startup and allocate/deallocate them on demand. Again protecting with mutex/cs/sema.
That depends on your architecture.
It sounds like you're using a server->distributed client model? In that case I would implement some sort of a layer for DB access, and hide connection pooling, etc. behind a data access facade.
I want to know the technical reasons why the lift webframework has high performance and scalability? I know it uses scala, which has an actor library, but according to the install instructions it default configuration is with jetty. So does it use the actor library to scale?
Now is the scalability built right out of the box. Just add additional servers and nodes and it will automatically scale, is that how it works? Can it handle 500000+ concurrent connections with supporting servers.
I am trying to create a web services framework for the enterprise level, that can beat what is out there and is easy to scale, configurable, and maintainable. My definition of scaling is just adding more servers and you should be able to accommodate the extra load.
Thanks
Lift's approach to scalability is within a single machine. Scaling across machines is a larger, tougher topic. The short answer there is: Scala and Lift don't do anything to either help or hinder horizontal scaling.
As far as actors within a single machine, Lift achieves better scalability because a single instance can handle more concurrent requests than most other servers. To explain, I first have to point out the flaws in the classic thread-per-request handling model. Bear with me, this is going to require some explanation.
A typical framework uses a thread to service a page request. When the client connects, the framework assigns a thread out of a pool. That thread then does three things: it reads the request from a socket; it does some computation (potentially involving I/O to the database); and it sends a response out on the socket. At pretty much every step, the thread will end up blocking for some time. When reading the request, it can block while waiting for the network. When doing the computation, it can block on disk or network I/O. It can also block while waiting for the database. Finally, while sending the response, it can block if the client receives data slowly and TCP windows get filled up. Overall, the thread might spend 30 - 90% of it's time blocked. It spends 100% of its time, however, on that one request.
A JVM can only support so many threads before it really slows down. Thread scheduling, contention for shared-memory entities (like connection pools and monitors), and native OS limits all impose restrictions on how many threads a JVM can create.
Well, if the JVM is limited in its maximum number of threads, and the number of threads determines how many concurrent requests a server can handle, then the number of concurrent requests will be determined by the number of threads.
(There are other issues that can impose lower limits---GC thrashing, for example. Threads are a fundamental limiting factor, but not the only one!)
Lift decouples thread from requests. In Lift, a request does not tie up a thread. Rather, a thread does an action (like reading the request), then sends a message to an actor. Actors are an important part of the story, because they are scheduled via "lightweight" threads. A pool of threads gets used to process messages within actors. It's important to avoid blocking operations inside of actors, so these threads get returned to the pool rapidly. (Note that this pool isn't visible to the application, it's part of Scala's support for actors.) A request that's currently blocked on database or disk I/O, for example, doesn't keep a request-handling thread occupied. The request handling thread is available, almost immediately, to receive more connections.
This method for decoupling requests from threads allows a Lift server to have many more concurrent requests than a thread-per-request server. (I'd also like to point out that the Grizzly library supports a similar approach without actors.) More concurrent requests means that a single Lift server can support more users than a regular Java EE server.
at mtnyguard
"Scala and Lift don't do anything to either help or hinder horizontal scaling"
Ain't quite right. Lift is highly statefull framework. For example if a user requests a form, then he can only post the request to the same machine where the form came from, because the form processeing action is saved in the server state.
And this is actualy a thing which hinders scalability in a way, because this behaviour is inconistent to the shared nothing architecture.
No doubt that lift is highly performant but perfomance and scalability are two different things. So if you want to scale horizontaly with lift you have to define sticky sessions on the loadbalancer which will redirect a user during a session to the same machine.
Jetty maybe the point of entry, but the actor ends up servicing the request, I suggest having a look at the twitter-esque example, 'skitter' to see how you would be able to create a very scalable service. IIRC, this is one of the things that made the twitter people take notice.
I really like #dre's reply as he correctly states the statefulness of lift being a potential problem for horizontal scalability.
The problem -
Instead of me describing the whole thing again, check out the discussion (Not the content) on this post. http://javasmith.blogspot.com/2010/02/automagically-cluster-web-sessions-in.html
Solution would be as #dre said sticky session configuration on load balancer on the front and adding more instances. But since request handling in lift is done in thread + actor combination you can expect one instance handle more requests than normal frameworks. This would give an edge over having sticky sessions in other frameworks. i.e. Individual instance's capacity to process more may help you to scale
you have Akka lift integration which would be another advantage in this.