Akka and Session Beans - akka

The Typesafe whitepaper (v5) states:
"In different scenarios, actors may be an alternative to: a thread; a Java EE session bean; ..."
I don't understand how an actor is an alternative to a session bean, because they work completely differently: an actor is called serially by passing messages to it and it processes the messages one at a time in the order in which they are sent. That means the running of any business logic inside the actor is synchronised. Session beans on the other hand are pooled - there is a number of them and multiple threads can run the same business logic at any time meaning that the logic is run concurrently.
Can anyone clear up my misunderstanding of this statement?

You can pool Actors (children) or behind Akka Routers (also technically children), so that way you can tune "concurrency".

Too much ejb concurrency can often be a cause of various lock contention and performance degradation.
Meanwhile akka is aimed at async processing and nio. This approach benefits most of all when number of threads is near the number of CPU cores.
Note that akka doesn't enforce exactly one processing thread. See e.g. Akka control threadpool threads

Related

Is it possible to prioritize (give a priority) to specific Akka's Actor?

I've made my research about Akka Framework,
And I would like to know ;
Is it possible to give a priority to a specific actor?
I mean - actors are working while getting a "let" message from the queue,
Is there an option to let an actor work even when it's not his turn yet to work?
Effectively, yes.
One of the parts of your Actor configuration is which Dispatcher those actors will use. A dispatcher is what connects the actor to the actual threads that will execute the work. (Dispatchers default to ForkJoinPools, but can also be dedicated thread pools or even threads dedicated to a specific actor.)
So the typical way you give an Actor "priority" is to give it a dedicated dispatcher, and thereby dedicated threads. For example, Akka itself does this for its internal messages: they run on a dedicated dispatcher so that even you deploy a bunch of poorly written actors that block the threads, Akka itself can still function.
I put "priority" in quotes, because you aren't guaranteeing a specific order of processing. (There are other ways to do that, but not across Actors.) But you are solving the case where you want specific actors to always have a greater access to resources and/or specific actors to get executed promptly.
(In theory, you could take this even further and create a ThreadPoolExecutor with higher priority threads, and then create a Dispatcher based on that ThreadPoolExecutor. That would truly give OS-level priority to an Actor, but that would only be likely relevant in very unusual circumstances.)
EDIT TO RESPOND TO "do mailboxes and dispatchers are the same" [sic]?
No. Each actor has a mailbox. So sometimes we talk about the behavior of mailboxes when discussing the behavior of actors, as the behavior of the mailbox governs the ordering of the actor's message processing.
But dispatchers are a distinct concept. Actors have a dispatcher, but it is many to one. (i.e. each Actor has one mailbox, but there may be many actors associated with a single dispatcher.)
For example, a real world situation might be:
System actors are processed by the internal dispatcher. To quote the docs "To protect the internal Actors that are spawned by the various Akka modules, a separate internal dispatcher is used by default." i.e. no matter how badly screwed up your own code might be, you can't screw up the heartbeat processing and other system messages because they are running on their own dispatcher, and thus their own threads.
Most actors (millions of them perhaps) are processed by the default dispatcher. Huge numbers of actors, as long as they are well behaved, can be handled with a tiny number of threads. So they might all be configured to use the default dispatcher.
Badly behaved actors (such as those that block) might be configured to be processed by a dedicated "blocking" dispatcher. By isolating blocking dispatchers into a separate dispatcher they don't impact the response time of the default dispatcher.
Although I don't see this often, you might also have a dispatcher for extremely response time sensitive actors that gives them a dedicated thread pool. Or even a "pinned" dispatcher that gives an actor a dedicated thread.
As I mentioned this isn't really "priority", this is "dedicated resources". Because one of the critical aspects of actors is that the are location independent. So if Actor A is on Node A, and Actor B is on Node B, I can't guarantee that Actor A will ALWAYS act first. Because doing so would involve an ASTRONOMINCAL amount of overhead between nodes. All I can reasonably do is give Actor A dedicated resources so that I know that Actor A should always be able to act quickly.
Note that this is what the internal dispatcher does as well. We don't guarantee that heartbeat messages are always processed first, but we do make sure that there are always threads available to process system messages, even if some bad user code has blocked the default dispatcher.

Akka light-weight thread

In akka documentation:
The good news is that Akka actors conceptually each have their own light-weight thread, which is completely shielded from the rest of the system.
What is a light-weight thread? Aren't threads considered to be expensive resources?
They key word here is 'conceptually': indeed JVM/OS threads are (relatively) expensive resources, and for this reason Akka is not implemented with a thread per actor - that would be too heavy.
Akka does make sure each actor only processes one message at a time. This means 'inside the actor' your don't have to worry about concurrently and that is what meant by the statement that 'conceptually' you can think of an actor running on a 'lightweight thread' (though internally it's not implemented using a thread per actor).

How AKKA provides concurrency for a single actor?

I have read that:
Akka ensures that each instance of an actor runs in its own lightweight thread and its messages are processed one at a time.
If this is the case that AKKA actors processes its messages sequentially then how AKKA provides concurrency for a single Actor.
Actors are independent agents of computation, each one is executed strictly sequentially but many actors can be executed concurrently. You can view an Actor as a Thread that costs only about 0.1% of what a normal thread costs and that also has an address to which you can send messages—you can of course manage a queue in your own Thread and use that for message passing but you’d have to implement all that yourself.
If Akka—or the Actor Model—stopped here, then it would indeed not be very useful. The trick is that giving stable addresses (ActorRef) to the Actors enables them to communicate even across machine boundaries, over a network, in a cluster. It also allows them to be supervised for principled failure handling—when a normal Thread throws an exception it simply terminates and nothing is done to fix it.
It is this whole package of encapsulation (provided by hiding everything behind ActorRef), message-based communication that is location transparent, and support for failure handling that makes the Actor Model a perfect fit for expressing distributed systems. And today there is a distributed system of many CPU cores within even the smallest devices.

Converting a Java threads to AKKA actors

I do have Java application with lot of threads and thread pools. Can we use AKKA to replace the threads and thread pools?
It depends on what your threads are doing. Are they doing blocking IO or taking locks and sharing mutable data between themselves? If so akka might not be a great fit as actors generally should avoid blocking for io or locks. On the other hand, if the threads do isolated non-blocking work, and can communicate via message passing, akka is probably a good fit.
Yes, you absolutely can. Be careful about the above poster's response. That isn't entirely accurate. Actors can do blocking I/O. You just use child actors to represent each blocking connection. The newbie mistake would be to treat actors the same way you would treat a thread... in which case, the poster above's answer would be right. But, if you pass off the blocking to a lower actor and use an ad-hoc actor each time, you'll never have to block the main throughput.
But, forgive me. I have gone off track. In short, yes, you can. But keep in mind there will be a learning curve. Actors programming is a different paradigm and it needs to be handled a bit differently.
However, programming concurrency with actors is leagues easier than with threads and locking (literally). Just make your app reactive instead of time-based and many concurrency concerns just stop existing.
Check out the AKKA docs on their site. They are very thorough. Also the books Akka concurrency and effective Akka. Just keep 'em on the desk as a reference.

Concurrency within Java EE environment

Goal
My goal to better understand how concurrency within Java EE environment and how can I better consume it.
General questions
Let's take typical servlet container (tomcat) as example. For each request it uses 1 thread to process it. Thread pool is configured so, that it can have max 80 threads in pool. Let's also take simple webapp - it makes some processing and DB communication during each request.
At peak time I can see 80 parallel running threads (+ several other infrastructure threads). Let's also assume I running it in 'm1.large' EC2 instance.
I don't think that all these threads can really run in parallel on this hardware. So now scheduler should decide how better to split CPU time between them all. So the questions are - how big is scheduler overhead in this case? How can I find right balance between thread amount and processing speed?
Actors comparison
Having 80+ threads on 4 core CPU doesn't sound healthy to me. Especially if most of them are blocked on some kind of IO (DB, Filesystem, Socket) - they just consume precious resources. What if we will detach request from thread and will have only reasonable amount of threads (8 for instance) and will just send processing tasks to them. Of course in this case IO should be also non-blocking, so that I receive events when some data, that I need, is available and I send event, if I have some results.
As far as I understand, Actor model is all about this. Actors are not bound to threads (at least in Akka and Scala). So I have reasonable thread pool and bunch of actors with mailboxes that contain processing tasks.
Now question is - how actor model compares to traditional thread-per-request model in terms of performance, scheduler overhead and resources (RAM, CPU) consumption?
Custom threads
I have some requests (only several) that take too much time to process. I optimized code and all algorithms, added caches, but it still takes too much time. But I see, that algorithm can be parallelized. It fits naturally in actor model - I just split my big task in several tasks, and then aggregate results somehow (if needed). But in thread-per-request model I need spawn my own threads (or create my small thread pool). As far as I know, it's not recommended practice within Java EE environment. And, from my point of view, it doesn't fits naturally in thread-per-request model. Question arise: how big my thread pool size should be? Even if I will make it reasonable in terms of hardware I still have this bunch of threads managed by servlet container. Thread management becomes decentralized and goes wild.
So my question - what is the best way to deal with these situations in thread-per-request model?
Having 80+ threads on 4 core CPU doesn't sound healthy to me. Especially if most of them are blocked on some kind of IO (DB, Filesystem, Socket) - they just consume precious resourecs.
Wrong. Exactly in this scenario the processors can handle many more threads than the number of individual cores, since most of the threads at any point in time are blocked waiting for I/O. Fair enough, context switching takes time, but that overhead is usually irrelevant compared to file/network/DB latency.
The rule of thumb that the number of threads should be equal - or a little more than - the number of processor cores applies only for computation-intensive tasks when the cores are kept busy most of the time.
I have some requests (only several) that take too much time to process. I optimized code and all algorithms, added caches, but it still takes too much time. But I see, that algorithm can be parallelized. It fits naturally in actor model - I just split my big task in several tasks, and then aggregate results somehow (if needed). But in thread-per-request model I need spawn my own threads (or create my small thread pool). As far as I know, it's not recommended practice within Java EE environment.
Never heard about that (but I don't claim myself to be the ultimate Java EE expert). IMHO there is nothing wrong in executing tasks associated with a single request parallelly using e.g. a ThreadPoolExecutor. Note that these threads are not request handling threads, so they don't directly interfere with the thread pool used by the EJB container. Except that they compete for the same resources of course, so they may slow down or completely stop other request processing threads in a careless setup.
what is the best way to deal with these situations in thread-per-request model?
In the end, you can't escape measuring concurrent performance and fine-tuning the size of your thread pool and other parameters for your own specific environment.
The whole point of Java EE is to put common architectural concerns like security, state, and concurrency into the framework and let you provide the bits of business logic or data mappings along with the wiring to connect them. As such, Java EE intentionally hides the nasty bits of concurrency (locking to read/write mutable state) in the framework.
This approach lets a much broader range of developers successfully write correct applications. A necessary side effect though is that these abstractions create overhead and remove control. That's both good (in making it simple and encoding policies as policies not code) and bad (if you know what you're doing and can make choices impossible in the framework).
It is not inherently bad to have 80 threads on a production box. Most will be blocked or waiting on I/O which is fine. There is a (tunable) pool of threads doing the actual computation and Java EE will give you external hooks to tune those knobs.
Actors are a different model. They also let you write islands of code (the actor body) that (can) avoid locking to modify state. You can write your actors to be stateless (capturing the state in the recursive function call parameters) or hide your state completely in an actor instance so the state is all confined (for react style actors you probably still need to explicitly lock around data access to ensure visibility on the next thread that runs your actor).
I can't say that one or the other is better. I think there is adequate proof that both models can be used to write safe, high-throughput systems. To make either perform well, you need to think hard about your problem and build apps that isolate parts of state and the computations on each kind of state. For code where you understand your data well and have a high potential for parallelism I think models outside Java EE make a lot of sense.
Generally, the rule of thumb in sizing compute-bound thread pools is that they should be approximately equal to N of cores + 2. Many frameworks size to that automatically. You can use Runtime.getRuntime().availableProcessors() to get N. If your problem decomposes in a divide-and-conquer style algorithm and the number of data items is large, I would strongly suggest checking out fork/join which can be used now as a separate library and will be part of Java 7.
As far as how to manage this, you're not supposed to spawn threads as such inside Java EE (they want to control that) but you might investigate sending a request to your data-crunching thread pool via a message queue and handling that request via a return message. That can fit in the Java EE model (a bit clumsily of course).
I have a writeup of actors, fork/join, and some other concurrency models here that you might find interesting: http://tech.puredanger.com/2011/01/14/comparing-concurrent-frameworks/