Akka Dispatcher Thread creation - akka

I have been working on Akka Actor model. I have an usecase where more than 1000 actors will be in active and I have to process those actors. I thought of controlling the thread count through configuration defined in the application.conf.
But no. of dispatcher thread created in my application makes me helpless in tuning the dispatcher configuration. Each time when I restart my application, I see different number of dispatcher threads created (I have checked this via Thread dump each time after starting the application).
Even thread count is not equal to the one which I defined in parallelism-min. Due to this low thread count, my application is processing very slowly.
On checking the no. of core in my machine through the below code:
Runtime.getRuntime().availableProcessors();
It displays 40. But the no. of dispatcher thread count created is less than 300 even I configured parallelism as 500.
Following is my application.conf file:
consumer-dispatcher {
type = "Dispatcher"
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 500
parallelism-factor = 20.0
parallelism-max = 1000
}
shutdown-timeout = 1s
throughput = 1
}
May I know on what basis akka will be creating dispatcher threads internally and how I can increase my dispatcher thread count to increase parallel processing of actors?

X-Post from discuss.lightbend.com
First let me answer the question directly.
A fork-join-executor will be backed by a java.util.concurrent.forkJoinPool pool with its parallelism set to the implied parallelism from the dispatcher config. (parallelism-factor * processors, but no larger than max or less than min). So, in your case, 800.
And while I’m no expert on the implementation of the ForkJoinPool the source for the Java implementation of ForkJoinPool says “All worker thread creation is on-demand, triggered by task submissions, replacement of terminated workers, and/or compensation for blocked workers.” and it has methods like getActiveThreads(), so it’s clear that ForkJoinPooldoesn’t just naively create a giant pool of workers.
In other words, what you are seeing is expected: it’s only going to create threads as they are needed. If you really must have a gigantic pool of worker threads you could create a thread-pool-executor with a fixed-pool-size of 800. This would give you the implementation you are looking for.
But, before you do so, I think you are entirely missing the point of actors and Akka. One of the reasons that people like actors is that they are much more lightweight than threads and can give you a lot more concurrency than a thread. (Also note that concurrency != parallelism as noted in the documentation on concepts.) So trying to create a pool of 800 threads to back 1000 actors is very wasteful. In the akka docs introduction it highlights "Millions of actors can be efficiently scheduled on a dozen of threads".
I can’t tell you exactly how many threads you need without knowing your application (for example if you have blocking behavior) but the defaults (which would give you a parallelism factor of 20) is probably just fine. Benchmark to be certain, but I really don’t think you have a problem with too few threads. (The ForkJoinPool behavior you are observing seems to confirm this.)

Related

How to assign task to specific thread in C++

Problem
I am currently writing a stream parser that parses multiple feeds that are coming in very fast. Let's assume that it is a twitter stream of accounts' tweets where there are X accounts. I am trying to make the processing as concurrent as possible, while also making sure each account's tweets are processed sequentially.
Each tweet requires some parsing that takes some time. So if I were to use a naive thread pool, I will run into a problem where some tweets assigned to quicker ending threads and a single account's tweets may be logged out of order.
This task can be approached using a producer-consumer model. Where in this case there is only one producer: the twitter feed. The consumers are where I am uncertain.
The Approach
My idea to tackle this is fairly simple: map each account to a bucket numbered between 1 and T. Where T is the number of threads available in my computer. Then process each bucket sequentially. This way all buckets can be run concurrently, and no single account's tweets will be logged out of order.
Here is a crude visualization of what that looks like with two threads and three accounts:
As you can see, since we have two threads, Accounts 1 and 3 map to the same thread but maintain internal consistency. Threads 1 & 2 can run concurrently with no conflicts ever arising.
This structure is also very extendable. If I have more producers with Accounts 4 & 5, for example. I can still add to threads 2 and 1, respectively, without losing internal account consistency.
What I've Done So Far/The Code
I'm not sure how to structure this programmatically. I'm fairly new to multi-threading in C++ so I'm using modified code from this blog post as a way to structure my file.
I'd take a read-through if you have the time, but basically my code is a minimal example replicating this process. There are 5 buckets. The tweet parsing is simulated by making a sleep for 1 second. I assign each task a mutex to lock it based on it's bucket. This is done using a simple mutex = mutex_map[task_id % NUM_BUCKETS].
The code is available here, although the VM is limited to 2 threads. If we scale up to 11 threads (on my machine), we run into race conditions where some threads beat the others. Essentially what happens is this:
The machine has 11 threads initially available.
It assigns Task 0, 5, and 10 to some threads in the thread pool.
Task 0 goes first, gets the mutex and locks up. Tasks 5 and 10 are waiting because the mutex for bucket 0 is locked
Once Task 0 is finished, sometimes Task 10 goes first, and sometimes Task 5 goes first.
Now the solution is to just limit the thread pool size to NUM_BUCKETS, but there is a core problem I'm trying to solve here, which is that what I want to happen in the background is not being implemented.
Solution?
Anyone have any suggested ideas on how to approach this? How do I enforce consistency within a bucket? I want to basically assign each task to a specific thread based on the hash. Not sure how to do so as thread pool is what manages this for me...

simultaneous tasks with 8051

Is there any way to run two tasks with the 8051 μC simultaneously? For example,
Task one
Delay 1 sec
P2.B2 = 1
Delay 1 sec
P2.B2 = 0
Task 2
If P1.B0 = 1
P2.B3=1
So at any time, press the switch connected to P2.0 is 1, LED at P2.3=ON, and P2.2 keeps LED at P2.2 blinking.
A task is something what is typically provided by the underlying OS. If you are running on a bare metal system without any OS, you have no tasks at the first point.
But your application can build its own tasks. The job is more or less easy. You have to build a scheduler, typically triggered by a hardware clock for task switching, create stacks for each of the tasks and some control structures for the maintenance of the tasks. As you have no MMU and no memory protection on bare metal systems like 8051, you simply can modify stack pointers to do the task switching.
That is exactly what a library like FreeRTos can do for you. There is a port for 8051 available as I know. Searching on the web returns a lot of links to 8051 FreeRtos. Maybe there are some more libraries which offering tasks for you.
But mostly the overhead of scheduling and all the administration effort is much to high. Running an endless loop which is doing some jobs by reading some kind of queues or flags is much easier and often the more efficient solution. Also running some jobs in interrupt service routines fits well to bare metal requirements.
I assume you are running on bare metal with no battery saving requirements. I assume you can now write a program, load it to your device and run it. What I suggest you do is roughly this.
This program should have a main loop, which in its most simple would be like this:
MAX_TIME is the largest possible value of system clock, should never be reached
task_table is table with
next execution time as system clock time (MAX_TIME means disabled)
function pointer
initialize task-table with the three tasks below
forever
for each task with time 0
set task time MAX_TIME (disable)
call task function (task probably enables itself or other task)
find a task with lowest non-zero time in task_queue
if task time is in past or now
set task time MAX_TIME (disable)
call task function (task probably enables itself or other task)
Time 0 tasks are checked separately, and then tasks with time, so that the time 0 tasks don't block each others or the tasks with time from ever being called. Same could be achieved in different ways, this is just an example.
Then your requirements really call for 3 "tasks":
task_p2_b2_0:
P2.B2 = 0
enable task task_p2_b2_1 at current_time + 1 second
task_p2_b2_1:
P2.B2 = 1
enable task task_p2_b2_0 at current_time + 1 second
task_p1_b0_poll:
If P1.B0 = 1
P2.B3=1
enable task task_p1_b0_poll at time 0 (or current time + 10 ms or whatever)
Future development: Above is for a small number of static tasks. Iterating up to... 5-10 item table is so fast that there is no point trying to optimize it. Once you have more tasks than that, you should consider using a priority heap to store the tasks. Then you could also consider making main loop sleep when it has nothing to do, and use interrupt to wake it up (timer interrupt, serial port interrupt, pin activation interrupt etc). Also, you could have different task types, such as tasks which are activated when there is some IO (button press, byte from serial port, whatever). Etc. At the upper end of this adding features is a complete operating system really, but for simple things what I wrote above is really enough.

Is there a limit to the number of created events?

I'm developing a C++14 Windows DLL on VS2015 that runs on all Windows version >= XP.
TL;DR
Is there a limit to the number of events, created with CreateEvent, with different names of course?
Background
I'm writing a thread pool class.
The class interface is simple:
void AddTask(std::function<void()> task);
Task is added to a queue of tasks and waiting workers (vector <thread>) activate the task when available.
Requirement
Wait (block) for a task for a little bit before continuing with the flow. Meaning, some users of ThreadPool, after calling AddTask, may want to wait for a while (say 1 second) for the task to end, before continuing with the flow. If the task is not done yet, they will continue with the flow anyways.
Problem
ThreadPool class cannot provide Wait interface. Not its responsibility.
Solution
ThreadPool will SetEvent when task is done.
Users of ThreadPool will wait (or not. depend on their need) for the event to be signaled.
So, I've changed the return value of ThreadPool::AddTask from void to int where int is a unique task ID which is essentially the name of the event to be singled when a task is done.
Question
I don't expect more than ~500 tasks but I'm afraid that creating hundreds of events is not possible or even a bad practice.
So is there a limit? or a better approach?
Of course there is a limit (if nothing else; at some point the system runs out of memory).
In reality, the limit is around 16 million per process.
You can read more details here: https://blogs.technet.microsoft.com/markrussinovich/2009/09/29/pushing-the-limits-of-windows-handles/
You're asking the wrong question. Fortunately you gave enough background to answer your real question. But before we get to that:
First, if you're asking what's the maximum number of events a process can open or a system can hold, you're probably doing something very very wrong. Same goes for asking what's the maximum number of files a process can open or what's the maximum number of threads a process can create.
You can create 50, 100, 200, 500, 1000... but where does it stop? If you're even considering creating that many of them that you have to ask about a limit, you're on the wrong track.
Second, the answer depends on too many implementation details: OS version, amount of RAM installed, registry settings, and maybe more. Other programs running also affect that "limit".
Third, even if you knew the limit - even if you could somehow calculate it at runtime based on all the relevant factors - it wouldn't allow you to do anything that you can't already do now.
Lets say you find out the limit is L and you have created exactly L events by now. Another task come in. What do you do? Throw away the task? Execute the task without signaling an event? Wait until there are fewer than L events and only then create an event and start executing the task? Crash the process?
Whatever you decide you can do it just the same when CreateEvent fails. All of this is completely pointless. And this is yet another indication that you're asking the wrong question.
But maybe the most wrong thing you're doing is saying "the thread pool class can't provide wait because it's not its responsibility, so lets have the thread pool class provide an event for each task that the thread pool will signal when the task ends" (in paraphrase).
It looks like by the end of the sentence you forgot the premise from the beginning: It's not the thread pool's responsibility!
If you want to wait for the task to finish have the task itself signal when it's done. There's no reason to complicate the thread pool because someone, sometimes want to wait on tasks. Signaling that the task is done is the task's job:
event evt; ///// this
thread_pool.queue([evt] {
// whatever
evt.signal(); ///// and this
});
auto reason = wait(evt, 1s);
if (reason == timeout) {
log("bummer");
}
The event class could be anything you want - a Windows event, and std::promise and std::future pair, or anything else.
This is so simple and obvious.
Complicating the thread pool infrastructure, taking up valuable system resources for nothing, and signaling synchronization primitives even when no one's listening just to save the two marked code lines above in the few cases where you actually want to wait for the task is unjustifiable.

2 different task_group instances not running tasks in parallel

I wanted to replace the use of normal threads with the task_group class from ppl, but I ran in to the following problem:
I have a class A with a task_group member,
create 2 different instances of class A,
start a task in the task_group of the first A instance (using run),
after a few seconds start a task in the task_group of the second A instance.
I'm expecting the two tasks to run in parallel but the second task wait for the first task to finish then starts.
This is happening only in my application where the tasks are started from a static function. I did the same scenario in a test application and the tasks are running correctly in parallel.
After spending several hours trying to figure this out I switched back to normal threads.
Does anyone knows why is the concurrency run-time having this behavior, or how I can avoid this?
EDIT
The problem was that it was running on a single core CPU and concurrency run-time looks at throughput. I wonder if microsoft parallel patterns library has the concept of an active object, or something on the lines so that you can specify that the task you are about to lunch is to be executed in parallel with the thread you start it from...
The response can be found here: http://social.msdn.microsoft.com/Forums/en/parallelcppnative/thread/85a84373-4c3d-4862-bff3-9a21ffe82493
For one core machines, this is expected "default" behavior. This can be changed.
By default, number of tasks that can run in parallel = number of hardware threads (num of cores). This improves the raw throughut and efficiency of completing tasks.
However, there are a number of situations where a developer would want many tasks running in parallel, regardless of the number of cores. In this case you have two options:
Oversubsribe locally.
In your example above, you would use
void lengthyTask()
{
Context::Oversubscribe(true)
...do a lengthy task (//OR a blocking task)
Context::Oversubscribe(false)
}
Oversubcribe the scheduler when you start the application.
SchedulerPolicy policy(1, MaxConcurrency, GetProcessorCount() * 2);
SetDefaultSchedulerPolicy(policy);

Cannot create a new thread because the task queue has reached it maximum limit 5000

I'm using quite a few cfthreads in a scheduled task (because cf runs out of memory otherwise), and now I'm getting the following error:
Cannot create a new thread because the task queue has reached it maximum
limit 5000.
So here are my questions:
what is the "task queue" exactly, and where are the docs?
how do I increase this limit?
how can I determine what the limit is dynamically? and how many threads are already in the queue?
Why not use the run-join idiom I provided as an answer to another question of yours: many queries in a task to generate json? You could alter that code example to create several threads and then join if you're looking for things work asynchronously. In addition, having as many threads as your question describes actually slow things down because the server spends too much time context switching between threads.
It looks like the limit is a built-in limit that cannot be changed.
The message above is an error message though, so you could wrap the cfthread in a cftry to find out when the limit is reached.