In lagom: on increasing concurrent http calls, thread count (akka.actor.default-dispatcher) keeps increasing. How to control this behaviour? - akka

We observe that on increasing concurrent http calls to our service, thread count (akka.actor.default-dispatcher) keeps increasing (see screenshot from visualVM). Also after the requests stop, the thread count don’t go down. And most of these remain in PARK state.
Is this proportional increase of threads an expected behaviour? How do we control this and reuse the same actors or kill the actors after request has been served.
I’m running the shopping-cart example from lagom-samples.
akka.actor.default-dispatcher {
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 2
parallelism-factor = 1.0
parallelism-max = 6
}
throughput = 1
}
VisualVM SS of thread analysis for lagom application
Edit: Using thread-pool-executor as akka.actor.default-dispatcher stops serving any requests after multiple (20-30) concurrent requests. Even console goes unresponsive.
default-dispatcher {
type = Dispatcher
executor = default-executor
throughput = 1
default-executor = { fallback = thread-pool-executor }
thread-pool-executor = {
keep-alive-time = 60s
core-pool-size-min = 8
core-pool-size-factor = 3.0
core-pool-size-max = 64
max-pool-size-min = 8
max-pool-size-factor = 3.0
max-pool-size-max = 64
task-queue-size = -1
task-queue-type = linked
allow-core-timeout = on
}
}
In the akka docs introduction it highlights “Millions of actors can be efficiently scheduled on a dozen of threads”. So in that case why would we need to create threads proportional to number of concurrent requests

Are you blocking in your calls? Eg, are you calling Thread.sleep? Or using some synchronous IO? If so, then what you're seeing is entirely expected.
Lagom is an asynchronous framework. All the IO and inter-service communication mechanisms it provides are non blocking. Its thread pools a tuned for non blocking. If you only using non blocking calls, you will see the thread pools behave with very low thread counts, and you won't find things going unresponsive.
But the moment you start blocking, all bets are off. Blocking requires one thread per request.
The default dispatcher that Akka uses is a fork join pool. It is designed for asynchronous use. If you block in a thread in its pool, it will start another thread to ensure other tasks can continue. So, that's why you see the thread pool grow. Don't block, and this won't happen.
The thread pool executor on the other hand uses a fixed number of threads. If you block on this, then you risk deadlocking the entire application. Don't block, and this won't happen.

Related

Actor not processing messages for quite some time

In my akka application , i am using a main actor as controller which receives commands and delegates it to processor actor .Processor actor upon finishing (which takes around 2 mins to finish each task) , passes the message to controller and then controller actor , sends message to Database actor for persistence. Both processor actor and db database actor are managed using router each with say 5 routees. I am using default dispatcher and all other akka configuration are default only .Now below is the scenario.
The controller actor is receiving around 100 messages which are passed to processor actor , i can see from the log that the processor has finished processing some messages (around 5) and passed on completion message to controller actor. But database actor starts executing after some 5 mins. During this 5 mins , however the processor actor is processing its pending messages. So its not like the application is idle.
When the volume of message is less , the flow from controller -> processor -> controller -> db actor is almost instantaneously and there is hardly any lag.
I dont want this lag after processing , DB execution should happen as soon as processing is finished. But seems threads are busy executing processor task.How can i overcome this situation , ideally i want the turn around time of my task execution to be less , but due to above behavior i am not able to achieve it.
By default, all Akka actors use the same executor which is limited to use 64 threads maximum. From https://doc.akka.io/docs/akka/current/general/configuration-reference.html :
# This will be used if you have set "executor = "default-executor"".
# If an ActorSystem is created with a given ExecutionContext, this
# ExecutionContext will be used as the default executor for all
# dispatchers in the ActorSystem configured with
# executor = "default-executor". Note that "default-executor"
# is the default value for executor, and therefore used if not
# specified otherwise. If no ExecutionContext is given,
# the executor configured in "fallback" will be used.
default-executor {
fallback = "fork-join-executor"
}
and fork-join-executor config:
# This will be used if you have set "executor = "fork-join-executor""
# Underlying thread pool implementation is java.util.concurrent.ForkJoinPool
fork-join-executor {
# Min number of threads to cap factor-based parallelism number to
parallelism-min = 8
# The parallelism factor is used to determine thread pool size using the
# following formula: ceil(available processors * factor). Resulting size
# is then bounded by the parallelism-min and parallelism-max values.
parallelism-factor = 1.0
# Max number of threads to cap factor-based parallelism number to
parallelism-max = 64
# Setting to "FIFO" to use queue like peeking mode which "poll" or "LIFO" to use stack
# like peeking mode which "pop".
task-peeking-mode = "FIFO"
}
The problem could be related to blocking calls in processor actors. Akka assigns separate threads from a pool of 64 to handle these blocking calls in processor actors and waits for one of them to complete message processing to be able to handle messages for other actors. This might cause the time lag that you observe between actors.
A key aspect on which Akka is based is that systems should remain responsive at all times. If you used the same dispatcher/thread pool for the blocking DB operations or processing messages as your main Akka routing infrastructure, it is possible that all the Akka threads could be occupied by the processing actors or DB operations, and your system would effectively be deadlocked until one of the blocking operations is completed. This might not be a problem for a simple system on a single JVM that only performs this task, but when it is scaled, it might cause a lot of problems.
In cases like yours where you cannot avoid blocking, a dedicated dispatcher for blocking operations should be used. This link talks about this aspect(though it is titled for Akka-Http, it can be generalized). You can create two types of dispatchers to handle the two different blocking operations. I also, think that you should throttle your blocking requests as to not overwhelm your system(use dispatchers for throttling). You could also implement buffers within your actors to deal with backpressure situations.
EDIT
The controller mailbox has 100 messages and 5 messages are taken and delegated to processor actors. Each processor actor takes 2 mins of time and sends the response back to the controller and the response gets queued in the controller's mailbox. But before processing these messages, the controller needs to process messages which were added before these messages effectively increasing the service time for processing the messages for controller. The lag is the culmination of this whole process. As soon as the controller got the response message, it delegated to the actor.I think that the processing time increases as the messages increase.
Let me know if it helps!!

Akka Dispatcher Thread creation

I have been working on Akka Actor model. I have an usecase where more than 1000 actors will be in active and I have to process those actors. I thought of controlling the thread count through configuration defined in the application.conf.
But no. of dispatcher thread created in my application makes me helpless in tuning the dispatcher configuration. Each time when I restart my application, I see different number of dispatcher threads created (I have checked this via Thread dump each time after starting the application).
Even thread count is not equal to the one which I defined in parallelism-min. Due to this low thread count, my application is processing very slowly.
On checking the no. of core in my machine through the below code:
Runtime.getRuntime().availableProcessors();
It displays 40. But the no. of dispatcher thread count created is less than 300 even I configured parallelism as 500.
Following is my application.conf file:
consumer-dispatcher {
type = "Dispatcher"
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 500
parallelism-factor = 20.0
parallelism-max = 1000
}
shutdown-timeout = 1s
throughput = 1
}
May I know on what basis akka will be creating dispatcher threads internally and how I can increase my dispatcher thread count to increase parallel processing of actors?
X-Post from discuss.lightbend.com
First let me answer the question directly.
A fork-join-executor will be backed by a java.util.concurrent.forkJoinPool pool with its parallelism set to the implied parallelism from the dispatcher config. (parallelism-factor * processors, but no larger than max or less than min). So, in your case, 800.
And while I’m no expert on the implementation of the ForkJoinPool the source for the Java implementation of ForkJoinPool says “All worker thread creation is on-demand, triggered by task submissions, replacement of terminated workers, and/or compensation for blocked workers.” and it has methods like getActiveThreads(), so it’s clear that ForkJoinPooldoesn’t just naively create a giant pool of workers.
In other words, what you are seeing is expected: it’s only going to create threads as they are needed. If you really must have a gigantic pool of worker threads you could create a thread-pool-executor with a fixed-pool-size of 800. This would give you the implementation you are looking for.
But, before you do so, I think you are entirely missing the point of actors and Akka. One of the reasons that people like actors is that they are much more lightweight than threads and can give you a lot more concurrency than a thread. (Also note that concurrency != parallelism as noted in the documentation on concepts.) So trying to create a pool of 800 threads to back 1000 actors is very wasteful. In the akka docs introduction it highlights "Millions of actors can be efficiently scheduled on a dozen of threads".
I can’t tell you exactly how many threads you need without knowing your application (for example if you have blocking behavior) but the defaults (which would give you a parallelism factor of 20) is probably just fine. Benchmark to be certain, but I really don’t think you have a problem with too few threads. (The ForkJoinPool behavior you are observing seems to confirm this.)

Why ZeroMQ is not using its internal threads for waiting (on receive / send)?

From the source code it is apparent that each ZeroMQ Socket create its own internal thread (weather your main app is multi-threaded or not), when you make the call zsocket_new.
Those internal threads are running a loop doing select (0, &readfds, &writefds, &exceptfds, timeout ? &tv : NULL) (line 167--173, select.cpp).
Now, what I do not understand is, despite the availability of those internal threads, why is that methods such as zmq_msg_recv behave as blocking calls, making your main thread block?
For example, consider this below code
zctx_t *ctx = zctx_new();
void *router = zsocket_new(ctx, ZMQ_ROUTER);
int rc = zsocket_bind(router, "tcp://*:8080");
zframe_t *handle = zframe_recv(router);
It blocks on zframe_recv on your calling thread, waiting at line 207--212 of signaler.cpp, doing int rc = select (0, &fds, NULL, NULL, timeout_ >= 0 ? &timeout : NULL)
In other words, for a single ZMQ-socket you have two threads doing two select() operations!!
If you want to make zframe_recv asynchronous you would have to start your-own thread and do polling or blocking inside that thread.
The question is - why is that ZMQ not using these internal threads for both selects, instead of forcing you to create another thread just to read/write the data??
If only it could use the internal thread appropriately it could become a very good actor model - any ideas why it is not doing it?
Now, what I do not understand is, despite the availability of those internal threads, why is that methods such as zmq_msg_recv behave as blocking calls, making your main thread block?
ZMQ encourages a network with a collection of single-threaded nodes. Each node has a collection of sockets that it uses to communicate with other nodes. The ZMQ contexts on nodes manage the actual transport of messages across wires and have their own collection of threads for doing that, but your node logic is encouraged to be single threaded.
This could be described as "reactive programming." Inbound messages are events, and you respond to those by sending outbound messages. When you call zmq_msg_recv or zmq_poll, the meaning is "wait for the next event" and is appropriately blocking.
If you find you cannot implement your desired node logic with a single-threaded solution, consider reviewing your design to make sure that your 3-threaded node should not instead be 3 single-threaded nodes.

Multithreaded Web Server Flow diagram

I am writing a Multi-threaded web server with 1 scheduling thread,1 queuing thread and n execution thread in C++ and it is a homework problem. I am not asking the code. I have created a flow for the server. Can you tell me if the flow is correct or not ?
main() //queuing thread
{
define sockets
create scheduling thread
create queue of n execution threads //n execution threads
accpet connection infinetly
{
insert the request in a queue
}
}
scheduling thread // scheduling thread
{
job = take each request from queue ( FCFS or SJF)
take 1 thread from queue of execution threads and assign the job request
}
Is this flow for the problem correct? I just need the direction.Thanks in advance.
Your pseudo-code looks reasonable to me.

Dedicated thread (one thread per connection) with buffering capability (c/c++)

My process reads from a single queue tasks that need to be sent to several destinations.
We need to maintain order between the tasks (ie task that arrived in the queue at 00:00 needs to be sent before the task that arrived at 00:01) therefore we cannot use thread pool. Order needs to be maintained per destination.
One solution is to create a dedicated thread per destination. The main thread reads the
task from the queue and depending on the destination finds the correct thread.
This solution has a problem: if a worker thread is busy, the master thread would remain blocked, making the system slow. What I need is a new queue per thread. The master thread
shares the resources to the queues and the worker thread reads the new queues for incoming
messages...
I would like to share my thought with the SO community, and I am searching for a C/C++ solution close to me description. Is there a library that implements such model?
The design you want is fairly straightforward; I think you can probably write the code you need and get it working in an hour or two. Looking for a 3rd party library to implement this is probably overkill (unless I am misunderstanding the problem).
In particular, for each 'worker' thread, you need a FIFO data structure (e.g. std::queue), a Mutex, and a mechanism that the 'master' thread can use to signal the thread to wake up and check the data structure for new messages (e.g. a condition variable, or a semaphore, or even a socketpair that the worker blocks on reading, and the master can send a byte on to wake the worker up).
Then to send a task to a particular worker thread, the master would do something like this (pseudocode):
struct WorkerThreadData & workerThread = _workerThreads[threadIndexIWantToSendTo];
workerThread.m_mutex.Lock();
workerThread.m_incomingTasksList.push_back(theNewTaskObject);
workerThread.m_mutex.Unlock();
workerThread.m_signalMechanism.SignalThreadToWakeUp(); // make sure the worker looks at the task list!
... and each worker thread would have an event loop like this:
struct WorkerThreadData & myData = _workerThreads[myWorkerIndex];
TaskObject * taskObject;
while(1)
{
myData.m_signalMechanism.WaitForSignal(); // block until the main thread wakes me up
myData.m_mutex.Lock();
taskObject = (myData.m_incomingTasks.length() > 0) ? myData.m_incomingTasks.pop_front() : NULL;
myData.m_mutex.Unlock();
if (taskObject)
{
taskObject->DoTheWork();
delete taskObject;
}
}
This will never block the master thread (for any significant amount of time), since the Mutex is only held very briefly by anyone. In particular, the worker threads are not holding the mutex while they are working on a task object.
The "need to maintain order" all-but-directly states that you're going to be executing the tasks serially no matter how many threads you have. That being the case, you're probably best off with just one thread servicing the requests.
You could gain something if the requirement is a bit looser than that -- for example, if all the tasks for one destination need to remain in order, but there's no ordering requirement for tasks with different destinations. If this is the case, then your solution of a master queue sending tasks to an input queue for each individual thread sounds like quite a good one.
Edit:
Specifying the number of threads/mutexes dynamically is pretty easy. For example, to take the number from the command line, you could do something on this order (leaving out error and sanity checking for the moment):
std::vector<pthread_t> threads;
int num_threads = atoi(argv[1]);
threads.resize(num_threads);
for (int i=0; i<num_threads; i++)
pthread_create(&threads[i], NULL, thread_routine, NULL);