Can Netty process a single pipeline with multiple threads? - concurrency

I have a Netty 4 application that is receiving messages at high throughput using a single network connection (or to be precise, a single IP multicast group), so processing of all channel handlers in the pipeline is basically single-threaded.
Is there a way to configure Netty to parallelize processing so that more than 1 core gets used? I'm thinking of the "pipelining pattern" of multi-threading aka synchronous concurrency. Is something like that built-in - or would I have to implement this myself, off the Netty pipeline?

You can add different handlers with different EventLoopGroups to the pipeline. This way you can offload to different threads. That said you may need to be careful in terms of ordering (depending on the protocol)

Related

I don't understand how to async operations can make a HTTP server concurrent

I am developing a HTTP server using boost asio. So far, I have been using async operations (aync_read, async_write etc.), but I want to make my server concurrent, that is, the same as a server that creates a new thread per each new client connected.
I have read some forums etc. and, apparently, a concurrent server can be made only by using the mentioned async operations. I do not understand how is this possible.
I mean, taking into account that the async operations' handlers are executed in the thread that called to io_service.run(), lets take that a client is being responsed at this moment. How can another client make a petition and been answered while the main thread is busy with the first client?
The meaning of the word "concurrent" is ambiguous.
You are right, an asynchronous server is not concurrent at all. It can process only one request at a time. But the key insight is that what most servers do is actually they take a request, do some light processing (parsing, serialization, validation, some light business logic, etc.) and then call external resources (e.g. some database). The server can then process other requests while waiting for the external resource. So it's only an illusion of being concurrent (processing happens one after another but really fast). And it works as long as the processing is relatively fast compared to io.
If your server is supposed to do some hard cpu computations then obviously there will be no concurrency at all. In that case the only way to make it concurrent is to add threads or processes (possibly on multiple machines).
Asynchronous IO does not make the server concurrent.
In fact, Asynchronous IO does not mean "multi-threaded" or "multi-processed" at all. Node.js servers are mono-threaded and using asynchronous IO.
Asynchronous IO just means your thread does not wait for the IO to finish, but does other stuff meanwhile (like accepting and processing new incoming requests).
So no, the premise that Asynchronous IO makes the server concurrent is wrong. it does not make it concurrent, it makes it scalable, as thread-per-request is not so scalable, but a proper thread-pool + event queue/coroutines are. the threads only deal with CPU bound tasks and the event queue/coroutines manages enqueuing and dequeuing started/finished IO operations.
Not sure if you're only looking for a theoretical answer or a design example, but have you seen the HTTP Server 3 example for boost.asio?
Concurrency is achieved by having a small thread pool to execute the work. When callbacks need to be handled, all threads calling io_service.run() can be chosen to execute the task.

Akka concurrent message proccesing by preserving the messages order

In our project, we publish and consume messages to/from a JMS broker. We have a PL/SOL producer and a Java consumer. The problem is however; he producer is 10 times faster than the consumer. Theofore we want to change the consumerr to work with multiple threads while reading and processing the messages.
But we need to preserve the order of the messages as well. That said, the messages shall be sent to the target system in the order they were published to the jms broker. I'm new to Akka and i'm trying to understand its features. Can we achieve that using akka dispatchers ?
Assuming you want to parallelize the consumption inside a single instance of JVM, what you describe is a good case for Akka Streams. This can be solved with the Actors, but you risk of running out of memory if the producer is too fast, because you'll need to queue the results for re-ordering.
Akka Streams handle this problem with the introduction of backpressure. If consumer can't keep up with the producer, it will indicate this, and producer will reduce the rate. Akka Streams also can maintain the order of the messages.
Akka Streams is a 1.0 software, so it's not yet as battle-hardened as pure Akka, but it's based on Akka and is coming from the Akka team, so it should be good and become even better in the future. Also the documentation is not organized in the best way possible yet.
It's also important to mention that Akka Streams, while implemented using Akka, is quite different paradigm than Actor Model or Future combinations. It's based on stream processing paradigm, so you'll have to adjust the way you think about your programs. Might be an issue for some teams.

TCP server with state information using network library

I'm writing a tcp server for an online turn-based game. I've already written a prototype using php sockets, but would like to move to C++. I've been looking at the popular network libraries (ASIO, ACE, POCO, LibEvent), but currently unclear which one would best suit my needs:
1) Connections are persistent (on the order of minutes), and the server must be able to handle 100+ simultaneous connections.
2) Connections must be able to maintain state information (user login info). [my php prototype currently requires each client request to contain the login info]
3) Optionally and preferably multi-threaded, but a single process. Prefer not to have 1 thread per connection, but a fixed number of threads working on all open connections.
I'm leaning towards POCO's TCPServer or Reactor frameworks, but not exactly sure if they meet my requirements. I think the Reactor is single threaded, and the TCPServer enforces 1:1 threading/connection. Am I correct?
In either case case, I'm not exactly sure how to do the most important task of associating login info to a specific connection with connections coming and going at random.
Boost.Asio should meet your requirements. The reactor queue can be serviced by multiple threads. Using asynchronous methods will enable your design of a fixed number of threads servicing all connections.
The tutorials and examples are probably the best place to start if you are unfamiliar with the library.
You might also take a look at MUSCLE, a multi-user networking library and server I wrote with this sort of application in mind. It's BSD-licensed, handles hundreds of users, and includes a server-side database mechanism for storing and sharing any information you want the clients to know about each other. The server is single-threaded by default, but I haven't found that to be a problem in practice (and it's possible to extend the server to be multithreaded if that turns out to be necessary).

How to design a client server architect

I like to know the server (TCP based) architecture to support large scale of clients(at least10K) to implement Fix server. My points are
How we design it.
How to listen on the open port? Use select or poll or any other function.
How to process the response of the client? On large scale we cannot create the one thread for each client.
Should the processing of response is in the different executable and share the request and response to the server executable through IPC.
There is much more on it. I would appreciate if anyone explains it or provide any link.
Thanks
An excellent resource for information on this topic is The C10K problem. Although the dimensions there seem a little old, the techniques are still applicable today.
The architecture depends on what you want to do with the clients incoming data. My guess is that for every incoming message you would perform some computations and probably also return a response.
In that case I would create 1 main listener thread that receives all the incoming messages (Actually, if your hardware has more than 1 physical network device, I would use a listener thread per device and make sure each one is listening to a specific device).
Get the number of CPUs that you have on your machine and create worker threads for each CPU and bind them each thread to one cpu (Maybe number of working thread should be num_of_cpu-1, to leave an availalbe cpu for the listener and dispatcher).
Each thread has a queue and semaphore, the main listener thread just push the incoming data into those queues. There are many way to perform load balancing (Will talk about it later).
Each working thread just works on the requests given to it, and put the response on another queue that is read by the dispatcher.
The dispatcher - there are 2 options here, use a thread for dispatcher (or thread per network device as for listeners), or have the dispatcher actually be the same thread as the listener.
There is some advantage to put them both on the same thread, since it makes it easier to detect lost socket connection and use the same fds for both reading and writing without thread synchronization. However, it could be that using 2 different threads would give better performance, it need to be tested.
Note about load balancing:
This is a topic of its own.
The simplest thing is to use 1 queue for all working threads, but the problem is that they have to lock in order to pop items and the locking can damage performance. (But you get the most balanced load).
Another quite simple approach would be to have a private queue for every worker and perform round-robin when inserting. After every X cycles check the size of all the queues. If some queues are much larger than others then leave them out for the next X cycles and then recheck them again. This is not the best approach, but a simple one to implement and gives some load balancing while no locking is needed.
By the way - There is a way to implement queue between 2 threads without blocking - but this is also another topic.
I hope it helps,
Guy
If the client and server are on a secure network then the security aspect is to be minimal - to the extent that the transfers are encrypted. If the clients and the server are not on a secure network - you first want the server and client to authenticate each other and then initiate encrypted data transfer. For data transfer, server-side authentication should suffice. At the end of this authentication use the session key to generate encrypted data stream (symmetric). consider using TFTP it is simple to implement and scales reasonably well.

why is the lift web framework scalable?

I want to know the technical reasons why the lift webframework has high performance and scalability? I know it uses scala, which has an actor library, but according to the install instructions it default configuration is with jetty. So does it use the actor library to scale?
Now is the scalability built right out of the box. Just add additional servers and nodes and it will automatically scale, is that how it works? Can it handle 500000+ concurrent connections with supporting servers.
I am trying to create a web services framework for the enterprise level, that can beat what is out there and is easy to scale, configurable, and maintainable. My definition of scaling is just adding more servers and you should be able to accommodate the extra load.
Thanks
Lift's approach to scalability is within a single machine. Scaling across machines is a larger, tougher topic. The short answer there is: Scala and Lift don't do anything to either help or hinder horizontal scaling.
As far as actors within a single machine, Lift achieves better scalability because a single instance can handle more concurrent requests than most other servers. To explain, I first have to point out the flaws in the classic thread-per-request handling model. Bear with me, this is going to require some explanation.
A typical framework uses a thread to service a page request. When the client connects, the framework assigns a thread out of a pool. That thread then does three things: it reads the request from a socket; it does some computation (potentially involving I/O to the database); and it sends a response out on the socket. At pretty much every step, the thread will end up blocking for some time. When reading the request, it can block while waiting for the network. When doing the computation, it can block on disk or network I/O. It can also block while waiting for the database. Finally, while sending the response, it can block if the client receives data slowly and TCP windows get filled up. Overall, the thread might spend 30 - 90% of it's time blocked. It spends 100% of its time, however, on that one request.
A JVM can only support so many threads before it really slows down. Thread scheduling, contention for shared-memory entities (like connection pools and monitors), and native OS limits all impose restrictions on how many threads a JVM can create.
Well, if the JVM is limited in its maximum number of threads, and the number of threads determines how many concurrent requests a server can handle, then the number of concurrent requests will be determined by the number of threads.
(There are other issues that can impose lower limits---GC thrashing, for example. Threads are a fundamental limiting factor, but not the only one!)
Lift decouples thread from requests. In Lift, a request does not tie up a thread. Rather, a thread does an action (like reading the request), then sends a message to an actor. Actors are an important part of the story, because they are scheduled via "lightweight" threads. A pool of threads gets used to process messages within actors. It's important to avoid blocking operations inside of actors, so these threads get returned to the pool rapidly. (Note that this pool isn't visible to the application, it's part of Scala's support for actors.) A request that's currently blocked on database or disk I/O, for example, doesn't keep a request-handling thread occupied. The request handling thread is available, almost immediately, to receive more connections.
This method for decoupling requests from threads allows a Lift server to have many more concurrent requests than a thread-per-request server. (I'd also like to point out that the Grizzly library supports a similar approach without actors.) More concurrent requests means that a single Lift server can support more users than a regular Java EE server.
at mtnyguard
"Scala and Lift don't do anything to either help or hinder horizontal scaling"
Ain't quite right. Lift is highly statefull framework. For example if a user requests a form, then he can only post the request to the same machine where the form came from, because the form processeing action is saved in the server state.
And this is actualy a thing which hinders scalability in a way, because this behaviour is inconistent to the shared nothing architecture.
No doubt that lift is highly performant but perfomance and scalability are two different things. So if you want to scale horizontaly with lift you have to define sticky sessions on the loadbalancer which will redirect a user during a session to the same machine.
Jetty maybe the point of entry, but the actor ends up servicing the request, I suggest having a look at the twitter-esque example, 'skitter' to see how you would be able to create a very scalable service. IIRC, this is one of the things that made the twitter people take notice.
I really like #dre's reply as he correctly states the statefulness of lift being a potential problem for horizontal scalability.
The problem -
Instead of me describing the whole thing again, check out the discussion (Not the content) on this post. http://javasmith.blogspot.com/2010/02/automagically-cluster-web-sessions-in.html
Solution would be as #dre said sticky session configuration on load balancer on the front and adding more instances. But since request handling in lift is done in thread + actor combination you can expect one instance handle more requests than normal frameworks. This would give an edge over having sticky sessions in other frameworks. i.e. Individual instance's capacity to process more may help you to scale
you have Akka lift integration which would be another advantage in this.