How to make load balancing broker more fully asynchronous? - c++

After reading through the ZMQ manual about the load balancing broker, I thought that it would be great to implement in my own code. So I did, adding some additional touches to make it more responsive. One performance enhancement I was looking to add was the ability to dispatch to multiple long-running work jobs concurrently. I think I'm right about this, I could be wrong though, so consider the following with respect to just the lbbroker code that's in the manual:
Two workers (clients) simultaneously request work, each with long running jobs given to them (by a manager, or manager). In the current code, It's good because it's not round-robin-ing the work to different recipients, it's selecting FCFS. But there's also a problem in that a reply is first needed from the first worker who gets through before work can be dispensed to the second worker.
Basically, I want to dole worker out as fast as there are workers ready to receive it, FCFS style and concurrently as well. At the same time, I don't want to lose the model that I have where manager A gets through to worker B, and worker B's reply gets back to manager A. Keeping this, which is facilitated by the request-reply pattern, while at the same time allowing worker B to receive the only manager's second work job while A may still be processing it's job is very desired.
How can I most easily go about achieving this? Preferably by modifying my current lbbroker implementation, which isn't too different from lbbroker in the manual.
Thanks in advance.

As it turns out, my difficulties stemmed from an unsufficiently specific understanding of the load balancing broker example; it is not a broker that has REP sockets in order that it must receive between each work request/worker request. So the asynchronous issue does not exist at all.
Basically, a Router has an identity message and in forwarding that along in a consistent manner, you can avoid the issue entirely, and the router is free to connect other manager worker pairs while N concurrent workers work.

Related

Deliberate Blocking in Akka Actors

I understand that Akka actors should not block in order to stay reactive to messages, but how do I structure my service where I want to monitor a process running for an indefinite period of time?
For example, we are using the Amazon Kinesis Connector library. You create a connector with a given configuration, which inherits from Runnable, and then call the Run() method. The connector simply runs indefinitely, pulling data from Kinesis, and writing it to Amazon S3. In fact, if the runnable returns, then that is an error, and it needs to be restarted.
Approach (1) would be to simply create a child actor for each Kinesis Connector running, and if the Run() method returns, you throw an exception, the Supervising Actor notices the exception and restarts the child actor. One connector per child actor per thread.
Approach (2) would be for the child actor to wrap the Kinesis Connector in a Future, and if the future returns, the actor would restart the Connector in another Future. Conceivably a single actor could manage multiple Connectors, but does this mean each Future is executing in a separate thread?
Which approach would be most in line with the philosophy of Akka, or is there some other approach people recommend? In general, I want to catch any problems with any Connector, and restart it. In total there would not be more than a half dozen Connectors running in parallel.
I would take approach 1. It should be noted though that actors do not have a dedicated thread by default but they share a thread pool (the so called dispatcher, see: http://doc.akka.io/docs/akka/2.3.6/scala/dispatchers.html). This means that blocking is inherently dangerous because it exhausts the threads of the pool not letting other non-blocked actors to run (since the blocked actors do not put the thread back into the pool). Therefore you should separate blocking calls into a fixed size pool of dedicated actors, and you should assign these actors a PinnedDispatcher. This latter step ensures that these actors do not interfere with each other (they each have a dedicated thread) and ensures that these actors do not interfere with the rest of the system (all of the other actors will run on another dispatchers, usually on default-dispatcher). Be sure though to limit the number of actors running on the PinnedDispatcher since the number of used threads will grow with the number of actors on that dispatcher.
Of your two options, I'd say 1 is the more appropriate. No.2 suffers from the fact that, in order to exit from the future monad's world you need to call an Await somewhere, and there you need to specify a max duration which, in your case, does not make sense.
Maybe you could look into other options before going for it, tough. A few keywords that may inspire you are streams and distributed channels.

why using messaging queue in web applications

When developing my web application using Django, I faced a problem, when I call some functions locally they work correctly, but once i call them over HTTP request they are not executed.
I asked around and i was told to execute them asynchronously outside the request response cycle using celery and a messaging queue server, it worked well, but still I don't understand why i have to execute some tasks asynchronously even when i don't have race condition and there's only one client calling the web service.
This is a big black spot for me because I make it work without really knowing how.
Can anyone explain it to me?
Thanks.
The two main benefits I know of for queue-based systems are:
One, a response can be given to the client without having to wait for work to be done. This lets pages load faster and clients spend less time waiting.
Second, a queue gives you a central location for scheduled jobs that multiple workers can draw from. If a certain component of your application can't keep up with the amount of work there is to do (or if it fails for some reason), you can have other instances of that component doing the work, and there is a single place where all of the work that needs to be done can be found.

How to design a client server architect

I like to know the server (TCP based) architecture to support large scale of clients(at least10K) to implement Fix server. My points are
How we design it.
How to listen on the open port? Use select or poll or any other function.
How to process the response of the client? On large scale we cannot create the one thread for each client.
Should the processing of response is in the different executable and share the request and response to the server executable through IPC.
There is much more on it. I would appreciate if anyone explains it or provide any link.
Thanks
An excellent resource for information on this topic is The C10K problem. Although the dimensions there seem a little old, the techniques are still applicable today.
The architecture depends on what you want to do with the clients incoming data. My guess is that for every incoming message you would perform some computations and probably also return a response.
In that case I would create 1 main listener thread that receives all the incoming messages (Actually, if your hardware has more than 1 physical network device, I would use a listener thread per device and make sure each one is listening to a specific device).
Get the number of CPUs that you have on your machine and create worker threads for each CPU and bind them each thread to one cpu (Maybe number of working thread should be num_of_cpu-1, to leave an availalbe cpu for the listener and dispatcher).
Each thread has a queue and semaphore, the main listener thread just push the incoming data into those queues. There are many way to perform load balancing (Will talk about it later).
Each working thread just works on the requests given to it, and put the response on another queue that is read by the dispatcher.
The dispatcher - there are 2 options here, use a thread for dispatcher (or thread per network device as for listeners), or have the dispatcher actually be the same thread as the listener.
There is some advantage to put them both on the same thread, since it makes it easier to detect lost socket connection and use the same fds for both reading and writing without thread synchronization. However, it could be that using 2 different threads would give better performance, it need to be tested.
Note about load balancing:
This is a topic of its own.
The simplest thing is to use 1 queue for all working threads, but the problem is that they have to lock in order to pop items and the locking can damage performance. (But you get the most balanced load).
Another quite simple approach would be to have a private queue for every worker and perform round-robin when inserting. After every X cycles check the size of all the queues. If some queues are much larger than others then leave them out for the next X cycles and then recheck them again. This is not the best approach, but a simple one to implement and gives some load balancing while no locking is needed.
By the way - There is a way to implement queue between 2 threads without blocking - but this is also another topic.
I hope it helps,
Guy
If the client and server are on a secure network then the security aspect is to be minimal - to the extent that the transfers are encrypted. If the clients and the server are not on a secure network - you first want the server and client to authenticate each other and then initiate encrypted data transfer. For data transfer, server-side authentication should suffice. At the end of this authentication use the session key to generate encrypted data stream (symmetric). consider using TFTP it is simple to implement and scales reasonably well.

System architecture: simple approach for setting up background tasks behind a web application -- will it work?

I have a Django web application and I have some tasks that should operate (or actually: be initiated) on the background.
The application is deployed as follows:
apache2-mpm-worker;
mod_wsgi in daemon mode (1 process, 15 threads).
The background tasks have the following characteristics:
they need to operate in a regular interval (every 5 minutes or so);
they require the application context (i.e. the application packages need to be available in memory);
they do not need any input other than database access, in order to perform some not-so-heavy tasks such as sending out e-mail and updating the state of the database.
Now I was thinking that the most simple approach to this problem would be simply to piggyback on the existing application process (as spawned by mod_wsgi). By implementing the task as part of the application and providing an HTTP interface for it, I would prevent the overhead of another process that is holding all of the application into memory. A simple cronjob can be setup that sends a request to this HTTP interface every 5 minutes and that would be it. Since the application process provides 15 threads and the tasks are quite lightweight and only running every 5 minutes, I figure they would not be hindering the performance of the web application's user-facing operations.
Yet... I have done some online research and I have seen nobody advocating this approach. Many articles suggest a significantly more complex approach based on a full-blown messaging component (such as Celery, which uses RabbitMQ). Although that's sexy, it sounds like overkill to me. Some articles suggest setting up a cronjob that executes a script which performs the tasks. But that doesn't feel very attractive either, as it results in creating a new process that loads the entire application into memory, performs some tiny task, and destroys the process again. And this is repeated every 5 minutes. Does not sound like an elegant solution.
So, I'm looking for some feedback on my suggested approach as described in the paragraph before the preceeding paragraph. Is my reasoning correct? Am I overlooking (potential) problems? What about my assumption that application's performance will not be impeded?
All are reasonable approaches depending on your specific requirements.
Another is to fire up a background thread within the process when the WSGI script is loaded. This background thread could simply sleep and wake up occasionally to perform required work and then go back to sleep.
This method necessitates though that you have at most one Django process which the background thread runs in to avoid different processing doing the same work on any database etc.
Using daemon mode with a single process as you are would satisfy that criteria. There are potentially other ways you could achieve that though even in a multiprocess configuration.
Note that celery works without RabbitMQ as well. It can use a ghetto queue (SQLite, MySQL, Postgres, etc, and Redis, MongoDB), which is useful in testing or for simple setups where RabbitMQ seems overkill.
See http://ask.github.com/celery/tutorials/otherqueues.html
(Using Celery with Redis/Database as the messaging queue.)

why is the lift web framework scalable?

I want to know the technical reasons why the lift webframework has high performance and scalability? I know it uses scala, which has an actor library, but according to the install instructions it default configuration is with jetty. So does it use the actor library to scale?
Now is the scalability built right out of the box. Just add additional servers and nodes and it will automatically scale, is that how it works? Can it handle 500000+ concurrent connections with supporting servers.
I am trying to create a web services framework for the enterprise level, that can beat what is out there and is easy to scale, configurable, and maintainable. My definition of scaling is just adding more servers and you should be able to accommodate the extra load.
Thanks
Lift's approach to scalability is within a single machine. Scaling across machines is a larger, tougher topic. The short answer there is: Scala and Lift don't do anything to either help or hinder horizontal scaling.
As far as actors within a single machine, Lift achieves better scalability because a single instance can handle more concurrent requests than most other servers. To explain, I first have to point out the flaws in the classic thread-per-request handling model. Bear with me, this is going to require some explanation.
A typical framework uses a thread to service a page request. When the client connects, the framework assigns a thread out of a pool. That thread then does three things: it reads the request from a socket; it does some computation (potentially involving I/O to the database); and it sends a response out on the socket. At pretty much every step, the thread will end up blocking for some time. When reading the request, it can block while waiting for the network. When doing the computation, it can block on disk or network I/O. It can also block while waiting for the database. Finally, while sending the response, it can block if the client receives data slowly and TCP windows get filled up. Overall, the thread might spend 30 - 90% of it's time blocked. It spends 100% of its time, however, on that one request.
A JVM can only support so many threads before it really slows down. Thread scheduling, contention for shared-memory entities (like connection pools and monitors), and native OS limits all impose restrictions on how many threads a JVM can create.
Well, if the JVM is limited in its maximum number of threads, and the number of threads determines how many concurrent requests a server can handle, then the number of concurrent requests will be determined by the number of threads.
(There are other issues that can impose lower limits---GC thrashing, for example. Threads are a fundamental limiting factor, but not the only one!)
Lift decouples thread from requests. In Lift, a request does not tie up a thread. Rather, a thread does an action (like reading the request), then sends a message to an actor. Actors are an important part of the story, because they are scheduled via "lightweight" threads. A pool of threads gets used to process messages within actors. It's important to avoid blocking operations inside of actors, so these threads get returned to the pool rapidly. (Note that this pool isn't visible to the application, it's part of Scala's support for actors.) A request that's currently blocked on database or disk I/O, for example, doesn't keep a request-handling thread occupied. The request handling thread is available, almost immediately, to receive more connections.
This method for decoupling requests from threads allows a Lift server to have many more concurrent requests than a thread-per-request server. (I'd also like to point out that the Grizzly library supports a similar approach without actors.) More concurrent requests means that a single Lift server can support more users than a regular Java EE server.
at mtnyguard
"Scala and Lift don't do anything to either help or hinder horizontal scaling"
Ain't quite right. Lift is highly statefull framework. For example if a user requests a form, then he can only post the request to the same machine where the form came from, because the form processeing action is saved in the server state.
And this is actualy a thing which hinders scalability in a way, because this behaviour is inconistent to the shared nothing architecture.
No doubt that lift is highly performant but perfomance and scalability are two different things. So if you want to scale horizontaly with lift you have to define sticky sessions on the loadbalancer which will redirect a user during a session to the same machine.
Jetty maybe the point of entry, but the actor ends up servicing the request, I suggest having a look at the twitter-esque example, 'skitter' to see how you would be able to create a very scalable service. IIRC, this is one of the things that made the twitter people take notice.
I really like #dre's reply as he correctly states the statefulness of lift being a potential problem for horizontal scalability.
The problem -
Instead of me describing the whole thing again, check out the discussion (Not the content) on this post. http://javasmith.blogspot.com/2010/02/automagically-cluster-web-sessions-in.html
Solution would be as #dre said sticky session configuration on load balancer on the front and adding more instances. But since request handling in lift is done in thread + actor combination you can expect one instance handle more requests than normal frameworks. This would give an edge over having sticky sessions in other frameworks. i.e. Individual instance's capacity to process more may help you to scale
you have Akka lift integration which would be another advantage in this.