How can I create so many threads in c++ on beaglebone black - c++

I want to create over 500 threads in c++ on beaglebone black
but the program has errors.
could you explain why the errors is occured and how I fix the errors
in thread func. : call_from_thread(int tid)
void call_from_thread(int tid)
{
cout << "thread running : " << tid << std::endl;
}
in main func.
int main() {
thread t[500];
for(int i=0; i<500; i++) {
t[i] = thread(call_from_thread, i);
usleep(100000);
}
std::cout << "main fun start" << endl;
return 0;
}
I expects
...
...
thread running : 495
thread running : 496
thread running : 497
thread running : 498
thread running : 499
main fun start
but
...
...
thread running : 374
thread running : 375
thread running : 376
thread running : 377
thread running : 378
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
Aborted
could you help me?

The beaglebone black appears to have a maximum of 512MB of DRAM.
The minimum stack size of a thread according to pthread_create() is 2MB.
i.e. 2^29 / 2^21 = 2^8 = 256. So what you're probably seeing around thread 374 is the allocator cannot free memory fast enough to meet the demand which
is handled by throwing an exception.
If you really want to see this explode, try moving that sleep call inside your thread function. :)
You could try preallocating the stack to 1MB or less (pthreads), but that has it's
own set of problems.
The questions to really ask yourself is:
Is my application io bound or compute bound?
What's my memory budget to run this application? If you spend your entire physical memory
on thread stacks, you'll have nothing left for the shared program heap.
Do I really need this much parallelism to do the job? The A8 is a single core machine BTW.
Could I solve the problem using a thread pool? Or not use threads at all?
Finally, you can't set the stack size in std::thread api, but you can in
boost::thread.
Or just write a thin wrapper around pthreads (assuming Linux).

Whenever you use threads, there are three parts.
Start the threads
Do the work
Release the thread
You're starting the threads and doing the work, but you're not releasing them.
Releasing threads. There are two options for releasing a thread.
You can join the thread (which basically waits for it to finish)
You can detach the thread, and let it execute independently.
In this particular case, you don't want the program to finish until all threads are done executing, so you should join them.
#include <iostream>
#include <thread>
#include <vector>
#include <string>
auto call_from_thread = [](int i) {
// I create the entire message before printing it, so that there's no interleaving of messages between threads
std::string message = "Calling from thread " + std::to_string(i) + '\n';
// Because I only call print once, everything gets printed together
std::cout << message;
};
using std::thread;
int main() {
thread t[500];
for(int i=0; i<500; i++) {
// Here, I don't have to start the thread with any delay
t[i] = thread(call_from_thread, i);
}
std::cout << "main fun start\n";
// I join each thread (which waits for them to finish before closing the program)
for(auto& item : t) {
item.join();
}
return 0;
}

Related

OpenMP integer copied after tasks finish

I do not know if this is documented anywhere, if so I would love a reference to it, however I have found some unexpected behaviour when using OpenMP. I have a simple program below to illustrate the issue. Here in point form I will tell what I expect the program to do:
I want to have 2 threads
They both share an integer
The first thread increments the integer
The second thread reads the integer
Ater incrementing once, an external process must tell the first thread to continue incrementing (via a mutex lock)
The second thread is in charge of unlocking this mutex
As you will see, the counter which is shared between the threads is not altered properly for the second thread. However, if I turn the counter into an integer refernce instead, I get the expected result. Here is a simple code example:
#include <mutex>
#include <thread>
#include <chrono>
#include <iostream>
#include <omp.h>
using namespace std;
using std::this_thread::sleep_for;
using std::chrono::milliseconds;
const int sleep_amount = 2000;
int main() {
int counter = 0; // if I comment this and uncomment the 2 lines below, I get the expected results
/* int c = 0; */
/* int &counter = c; */
omp_lock_t mut;
omp_init_lock(&mut);
int counter_1, counter_2;
#pragma omp parallel
#pragma omp single
{
#pragma omp task default(shared)
// The first task just increments the counter 3 times
{
while (counter < 3) {
omp_set_lock(&mut);
counter += 1;
cout << "increasing: " << counter << endl;
}
}
#pragma omp task default(shared)
{
sleep_for(milliseconds(sleep_amount));
// While sleeping, counter is increased to 1 in the first task
counter_1 = counter;
cout << "counter_1: " << counter << endl;
omp_unset_lock(&mut);
sleep_for(milliseconds(sleep_amount));
// While sleeping, counter is increased to 2 in the first task
counter_2 = counter;
cout << "counter_2: " << counter << endl;
omp_unset_lock(&mut);
// Release one last time to increment the counter to 3
}
}
omp_destroy_lock(&mut);
cout << "expected: 1, actual: " << counter_1 << endl;
cout << "expected: 2, actual: " << counter_2 << endl;
cout << "expected: 3, actual: " << counter << endl;
}
Here is my output:
increasing: 1
counter_1: 0
increasing: 2
counter_2: 0
increasing: 3
expected: 1, actual: 0
expected: 2, actual: 0
expected: 3, actual: 3
gcc version: 9.4.0
Additional discoveries:
If I use OpenMP 'sections' instead of 'tasks', I get the expected result as well. The problem seems to be with 'tasks' specifically
If I use posix semaphores, this problem also persists.
This is not permitted to unlock a mutex from another thread. Doing it causes an undefined behavior. The general solution is to use semaphores in this case. Wait conditions can also help (regarding the real-world use cases). To quote the OpenMP documentation (note that this constraint is shared by nearly all mutex implementation including pthreads):
A program that accesses a lock that is not in the locked state or that is not owned by the task that contains the call through either routine is non-conforming.
A program that accesses a lock that is not in the uninitialized state through either routine is non-conforming.
Moreover, the two tasks can be executed on the same thread or different threads. You should not assume anything about their scheduling unless you tell OpenMP to do so with dependencies. Here, it is completely compliant for a runtime to execute the tasks serially. You need to use OpenMP sections so multiple threads execute different sections. Besides, it is generally considered as a bad practice to use locks in tasks as the runtime scheduler is not aware of them.
Finally, you do not need a lock in this case: an atomic operation is sufficient. Fortunately, OpenMP supports atomic operations (as well as C++).
Additional notes
Note that locks guarantee the consistency of memory accesses in multiple threads thanks to memory barriers. Indeed, an unlock operation on a mutex cause a release memory barrier that make writes visible from others threads. A lock from another thread do an acquire memory barrier that force reads to be done after the lock. When lock/unlocks are not used correctly, the way memory accesses are done is not safe anymore and this cause some variable not to be updated from other threads for example. More generally, this also tends to create race conditions. Thus, put it shortly, don't do that.

Deadlock using std::mutex to protect cout in multiple threads

Using cout in multiple threads might result in interleaved output.
So I tried to protect cout with a mutex.
The following code starts 10 background threads with std::async. When a thread starts, it prints "Started thread ...".
The main thread iterates over the futures of the background threads in the order in which they were created and prints out "Done thread ..." when the corresponding thread finished.
The output is synchronized correctly, but after some threads have started and some have finished (see output below), a deadlock occurres. All background threads left and the main thread are waiting for the mutex.
What is the reason for the deadlock?
When the print function is left or one iteration of the for loop ends, the lock_guard should unlock the mutex, so that one of the waiting threads would be able to proceed.
Why are all the threads left starving?
Code
#include <future>
#include <iostream>
#include <vector>
using namespace std;
std::mutex mtx; // mutex for critical section
int print_start(int i) {
lock_guard<mutex> g(mtx);
cout << "Started thread" << i << "(" << this_thread::get_id() << ") " << endl;
return i;
}
int main() {
vector<future<int>> futures;
for (int i = 0; i < 10; ++i) {
futures.push_back(async(print_start, i));
}
//retrieve and print the value stored in the future
for (auto &f : futures) {
lock_guard<mutex> g(mtx);
cout << "Done thread" << f.get() << "(" << this_thread::get_id() << ")" << endl;
}
cin.get();
return 0;
}
Output
Started thread0(352)
Started thread1(14944)
Started thread2(6404)
Started thread3(16884)
Done thread0(16024)
Done thread1(16024)
Done thread2(16024)
Done thread3(16024)
Your problem lies in the use of future::get:
Returns the value stored in the shared state (or throws its exception)
when the shared state is ready.
If the shared state is not yet ready (i.e., the provider has not yet
set its value or exception), the function blocks the calling thread
and waits until it is ready.
http://www.cplusplus.com/reference/future/future/get/
So if the thread behind the future didn't get to run yet, the function blocks until that thread finishes. However, you take ownership of the mutex before calling future::get, so whichever thread you're waiting for will not be able to attain the mutex for itself.
This should fix your deadlock problem:
int value = f.get();
lock_guard<mutex> g(mtx);
cout << "Done thread" << value << "(" << this_thread::get_id() << ")" << endl;
You lock the mutex and then wait for one of the futures, which in turn requires a lock on the mutex itself. Simple rule: Don't wait with locked mutexes.
BTW: Locking output streams is not very effective, because it can easily be circumvented by code you don't even control. Rather than using those globals, give a stream to code that needs to output something (dependency injection) and then collect the data from that stream in a threadsafe way. Or use a logging library, because that's probably what you wanted to do anyway.
It is good that the reason was spotted from the source. However, quite often the error, as it happens, may be not so easy to locate. And the reason may differ as well. Fortunately, in case of deadlock you can use debugger to investigate it.
I compiled and ran your example, then after attaching to it with gdb (gcc 4.9.2/Linux), there is a backtrace (noisy implementation details skipped):
#0 __lll_lock_wait ()
...
#5 0x0000000000403140 in std::lock_guard<std::mutex>::lock_guard (
this=0x7ffe74903320, __m=...) at /usr/include/c++/4.9/mutex:377
#6 0x0000000000402147 in print_start (i=0) at so_deadlock.cc:9
...
#23 0x0000000000409e69 in ....::_M_complete_async() (this=0xdd4020)
at /usr/include/c++/4.9/future:1498
#24 0x0000000000402af2 in std::__future_base::_State_baseV2::wait (
this=0xdd4020) at /usr/include/c++/4.9/future:321
#25 0x0000000000404713 in std::__basic_future<int>::_M_get_result (
this=0xdd47e0) at /usr/include/c++/4.9/future:621
#26 0x0000000000403c48 in std::future<int>::get (this=0xdd47e0)
at /usr/include/c++/4.9/future:700
#27 0x000000000040229b in main () at so_deadlock.cc:24
This is just what is explained in the other answers - the code in locked section (so_deadlock.cc:24) calls future::get(), which in turn (by forcing the result) trying to acquire the lock again.
It might be not that simple in other cases, there are usually several threads, but it's all there.

Qt Concurrent run member function from another member function

I would like to launch a member function in a separate thread calling it from another member.
Maybe the code below is clearer.
There is a button which launches the counter in a thread and it works:
void MainWindow::on_pushButton_CountNoArgs_clicked()
{
myCounter *counter = new myCounter;
QFuture<void> future = QtConcurrent::run(counter, &myCounter::countUpToThousand);
}
MyCounter class member functions:
void myCounter::countUpToHundred()
{
for(int i = 0; i<=100; i++)
{
qDebug() << "up to 100: " << i;
}
}
void myCounter::countUpToThousand()
{
for(int i = 0; i<=1000; i++)
{
qDebug() << "up to 1000: " << i;
if (i == 500)
{
//here I want to launch myCounter::countUpToHundred() in another thread
}
}
}
Thanks in advance.
Assuming you want to run the 2 counters parallel, you have 3 threads:
Thread 1: UI-Thread (or main thread)
Here runs on_pushButton_CountNoArgs_clicked(). You should not do hard work in this function because if you want to achive 60 frames per second, you only have 16 ms for all the work. To starting a new thread to run countUpToThousand() is a good idea.
Thread 2: background thread (started by QtConcurrent, running countUpToThousand)
This runs in parallel to Thread 1, and you are working with the same instance of myCounter (i.e. the same place in memory) so be careful which member variables you read and write.
Thread 3: background thread (started by QtConcurrent, running countUpToHundred)
Start using (as hank pointed out)
void myCounter::countUpToThousand()
{
for(int i = 0; i<=1000; i++)
{
qDebug() << "up to 1000: " << i;
if (i == 500)
{
QtConcurrent::run(this, &myCounter::countUpToHundred);
}
}
}
This will run in parallel to Thread 1 and Thread 2.
Now you might get crazy output results like 988\n99\n when one counter is at 999 and the other is at 88 because Thread 2 and Thread 3 will be printing to console at the same time and don't care about what the other thread is doing.
Also note that you must not delete counter before Thread 2 and Thread 3 are done because of you do, the'll still try to access the memory and your application will probably crash.

How do I make threads run sequentially instead of concurrently?

For example I want each thread to not start running until the previous one has completed, is there a flag, something like thread.isRunning()?
#include <iostream>
#include <vector>
#include <thread>
using namespace std;
void hello() {
cout << "thread id: " << this_thread::get_id() << endl;
}
int main() {
vector<thread> threads;
for (int i = 0; i < 5; ++i)
threads.push_back(thread(hello));
for (thread& thr : threads)
thr.join();
cin.get();
return 0;
}
I know that the threads are meant to run concurrently, but what if I want to control the order?
There is no thread.isRunning(). You need some synchronization primitive to do it.
Consider std::condition_variable for example.
One approachable way is to use std::async. With the current definition of std::async is that the associated state of an operation launched by std::async can cause the returned std::future's destructor to block until the operation is complete. This can limit composability and result in code that appears to run in parallel but in reality runs sequentially.
{
std::async(std::launch::async, []{ hello(); });
std::async(std::launch::async, []{ hello(); }); // does not run until hello() completes
}
If we need the second thread start to run after the first one is completed, is a thread really needed?
For solution I think try to set a global flag, the set the value in the first thread, and when start the second thread, check the flag first should work.
You can't simply control the order like saying "First, thread 1, then thread 2,..." you will need to make use of synchronization (i.e. std::mutex and condition-variables std::condition_variable_any).
You can create events so as to block one thread until a certain event happend.
See cppreference for an overview of threading-mechanisms in C++-11.
You will need to use semaphore or lock.
If you initialize semaphore to value 0:
Call wait after thread.start() and call signal/ release in the end of thread execution function (e.g. run funcition in java, OnExit function etc...)
So the main thread will keep waiting until the thread in loop has completed its execution.
Task-based parallelism can achieve this, but C++ does not currently offer task model as part of it's threading libraries. If you have TBB or PPL you can use their task-based facilities.
I think you can achieve this by using std::mutex and std::condition_variable from C++11. To be able to run threads sequentially array of booleans in used, when thread is done doing some work it writes true in specific index of the array.
For example:
mutex mtx;
condition_variable cv;
int ids[10] = { false };
void shared_method(int id) {
unique_lock<mutex> lock(mtx);
if (id != 0) {
while (!ids[id - 1]) {
cv.wait(lock);
}
}
int delay = rand() % 4;
cout << "Thread " << id << " will finish in " << delay << " seconds." << endl;
this_thread::sleep_for(chrono::seconds(delay));
ids[id] = true;
cv.notify_all();
}
void test_condition_variable() {
thread threads[10];
for (int i = 0; i < 10; ++i) {
threads[i] = thread(shared_method, i);
}
for (thread &t : threads) {
t.join();
}
}
Output:
Thread 0 will finish in 3 seconds.
Thread 1 will finish in 1 seconds.
Thread 2 will finish in 1 seconds.
Thread 3 will finish in 2 seconds.
Thread 4 will finish in 2 seconds.
Thread 5 will finish in 0 seconds.
Thread 6 will finish in 0 seconds.
Thread 7 will finish in 2 seconds.
Thread 8 will finish in 3 seconds.
Thread 9 will finish in 1 seconds.

Exiting two concurrent queues with three threads in c++

I'm having problems with quitting my multithreaded, multi-queued c++ program. The diagram shows the queue and thread structure. The diagram is here: http://i.stack.imgur.com/JGhXs.png
In short, I have three threads, and two concurrent queues. The second_handler(second_thread) pops from the first queue and pushes to the second queue. All (seems to) works fine, until I want to quit the program by hitting a keyboard key. I get this error:
terminate called after throwing an instance of 'boost::exception_detail::clone_impl >'
what(): boost::lock_error
Aborted
Here is my code:
main
int main() {
startMultiThreading();
cout <<"I"<<endl;
}
startMultiThreading
void startMultiThreading() {
boost::thread_group someVar_workers;
boost::thread_group someOtherVar_workers;
concurrent_queue<someVar* > someVar_queue(&someVar_workers);
concurrent_queue<someOtherVar*> someOtherVar_queue(&someOtherVar_workers);
boost::thread *first_thread = new boost::thread(first_handler, &someVar_queue);
boost::thread *second_thread = new boost::thread(second_handler, &someVar_queue, &someOtherVar_queue);
boost::thread *third_thread = new boost::thread(third_handler, &someOtherVar_queue);
someVar_workers.add_thread(first_thread);
someVar_workers.add_thread(second_thread);
someOtherVar_workers.add_thread(second_thread);
someOtherVar_workers.add_thread(third_thread);
while (true) {
if (thread_should_exit) {
cout << "threads should be killed" << endl;
while (!someVar_queue.empty()) {
usleep(1000);
}
someVar_workers.remove_thread(second_thread);
while (!someOtherVar_queue.empty()) {
usleep(1000);
}
someOtherVar_queue.cancel();
someVar_workers.join_all();
someOtherVar_workers.remove_thread(second_thread);
someOtherVar_workers.join_all();
break;
}
usleep(10000);
}
cout << "H" << endl;
}
What I would like is the program to finish both queues and then terminates normally. What I would expect is to see "I" printed before the program to terminate. Here is the output:
End of first_handler
threads should be
second_handler is canceled
End of second_handler
H
terminate called after throwing an instance of 'concurrent_queue<someOtherVar*>::Canceled'
Aborted
Press [Enter] to close the terminal ...
What am I doing wrong when closing the threads and the queues?
Thank you
First, see comment from KillianDS - your example is too long.
The other thing is: Do never call a destructor directly!!
The destructor is something special and the language garantuees to call it at the end of scope of the variable. If you call it manually, it will get called a second time which most probably leeds to undefined behaviour.
Calling destructor manually