Program crash on boost::thread_group - c++

I'm facing an issue with boost::thread_group.
My software executes some image processing on images acquired from multiple cameras and so far, the single-thread/serial execution of operations are completed successfully and I've tested all the single functions using google-test framework, just to avoid the possibilities of some code-mistakes or crashes of algorithms.
When i enable the multithread processing, where i feed my processing function with different data (the data are not shared between the threads, but i want to process 4 images in parallel, to speed up the execution), i receive after a while an error of segmentation fault and the program quits. All the try/catch blocks are not avoiding this crash.
Is there any hint/way to process in parallel different data and store them on disk?
I'll write down the snippet of code which causes the error:
boost::thread_group processingThreads;
for(unsigned int i = 0; i < images.size(); ++i)
{
processingThreads.create_thread(boost::bind(processingFunction, i, param, backgrounds[i][0], images[i][0]));
}
processingThreads.join_all();
where images and backgrounds are a std::vector> (one vector for each camera).
How can I deal with parallel executions of the same function?
Is Boost suitable for this goal?
EDIT
processingFunction is structured as following (shortly, because it's a long function)
bool processingThread(const unsigned int threadID, const ProjectConfiguration &param, Images &background, Images &cloud)
{
try{
if(!loadImage(cloud))
return false;
///Then: resize image, compute background mask, segment it.
return true;
catch(...){ return false; }
}
Thank you in advance
Mike

Related

"Project.exe has triggered a breakpoint" after implementing multithreading

I have a Visual Studio project that worked fine until I tried to implement multithreading. The project acquires images from a GigE camera, and after acquiring 10 images, a video is made from the acquired images.
The program flow was such that the program didn't acquire images when it was making the video. I wanted to change this, so I created another thread that makes the videos from the images. What I wanted is that the program will acquire images continuously, after 10 images are acquired, another thread runs in parallel that will make the video. This will continue until I stop the program, 10 images are acquired, video from these 10 images is made in parallel while the next 10 images are acquired and so on.
I haven't created threads before so I followed the tutorial on this website. Similar to the website, I created a thread for the function that saves the video. The function that creates the video takes the 10 images as a vector argument. I execute join on this thread just before the line where my main function terminates.
For clarity, here's pseudo-code for what I've implemented:
#include ...
#include <thread>
using namespace std;
thread threads[1];
vector<Image> images;
void thread_method(vector<Image> & images){
// Save vector of images to video
// Clear the vector of images
}
int main(int argc, char* argv[]){
// Some stuff
while(TRUE)
{
for (int i = 0; i < 10; i++)
{
//Acquire Image
//Put image pointer in images vector named images
}
threads[0] = thread(thread_method, images)
}
// stuff
threads[0].join();
cout << endl << "Done! Press Enter to exit..." << endl;
getchar();
return 0;
}
When I run the project now, a message pops up saying that the Project.exe has triggered a breakpoint. The project breaks in report_runtime_error.cpp in static bool __cdecl issue_debug_notification(wchar_t const* const message) throw().
I'm printing some cout messages on the console to help me understand what's going on. What happens is that the program acquires 10 images, then the thread for saving the video starts running. As there are 10 images, 10 images have to be appended to the video. The message that says Project.exe has triggered a breakpoint pops up after the second time 10 images are acquired, at this point the parallel thread has only appended 6 images from the first acquired set of images to the video.
The output contains multiple lines of thread XXXX has exited with code 0, after that the output says
Debug Error!
Program: ...Path\Program.exe
abort() has been called
(Press Retry to debug the application)
Program.exe has triggered a breakpoint.
I can't explain all this in a comment. I'm dropping this here because it looks like OP is heading in some bad directions and I'd like to head him off before the cliff. Caleth has caught the big bang and provided a solution for avoiding it, but that bang is only a symptom of OP's and the solution with detach is somewhat questionable.
using namespace std;
Why is "using namespace std" considered bad practice?
thread threads[1];
An array 1 is pretty much pointless. If we don't know how many threads there will be, use a vector. Plus there is no good reason for this to be a global variable.
vector<Image> images;
Again, no good reason for this to be global and many many reasons for it NOT to be.
void thread_method(vector<Image> & images){
Pass by reference can save you some copying, but A) you can't copy a reference and threads copy the parameters. OK, so use a pointer or std::ref. You can copy those. But you generally don't want to. Problem 1: Multiple threads using the same data? Better be read only or protected from concurrent modification. This includes the thread generating the vector. 2. Is the reference still valid?
// Save vector of images to video
// Clear the vector of images
}
int main(int argc, char* argv[]){
// Some stuff
while(TRUE)
{
for (int i = 0; i < 10; i++)
{
//Acquire Image
//Put image pointer in images vector named images
}
threads[0] = thread(thread_method, images)
Bad for reasons Caleth covered. Plus images keeps growing. The first thread, even if copied, has ten elements. The second has the first ten plus another ten. This is weird, and probably not what OP wants. References or pointers to this vector are fatal. The vector would be resized while other threads were using it, invalidating the old datastore and making it impossible to safely iterate.
}
// stuff
threads[0].join();
Wrong for reasons covered by Caleth
cout << endl << "Done! Press Enter to exit..." << endl;
getchar();
return 0;
}
The solution to joining on the threads is the same as just about every Stack Overflow question that doesn't resolve to "Use a std::string": Use a std::vector.
#include <iostream>
#include <vector>
#include <thread>
void thread_method(std::vector<int> images){
std::cout << images[0] << '\n'; // output something so we can see work being done.
// we may or may not see all of the numbers in order depending on how
// the threads are scheduled.
}
int main() // not using arguments, leave them out.
{
int count = 0; // just to have something to show
// no need for threads to be global.
std::vector<std::thread> threads; // using vector so we can store multiple threads
// Some stuff
while(count < 10) // better-defined terminating condition for testing purposes
{
// every thread gets its own vector. No chance of collisions or duplicated
// effort
std::vector<int> images; // don't have Image, so stubbing it out with int
for (int i = 0; i < 10; i++)
{
images.push_back(count);
}
// create and store thread.
threads.emplace_back(thread_method,std::move(images));
count ++;
}
// stuff
for (std::thread &temp: threads) // wait for all threads to exit
{
temp.join();
}
// std::endl is expensive. It's a linefeed and s stream flush, so save it for
// when you really need to get a message out immediately
std::cout << "\nDone! Press Enter to exit..." << std::endl;
char temp;
std::cin >>temp; // sticking with standard librarly all the way through
return 0;
}
To better explain
threads.emplace_back(thread_method,std::move(images));
this created a thread inside threads (emplace_back) that will call thread_method with a copy of images. Odds are good that the compiler would have recognized that this was the end of the road for this particular instance of images and eliminated the copying, but if not, std::move should give it the hint.
You are overwriting your one thread in the while loop. If it's still running, the program is aborted. You have to join or detach each thread value.
You could instead do
#include // ...
#include <thread>
// pass by value, as it's potentially outliving the loop
void thread_method(std::vector<Image> images){
// Save vector of images to video
}
int main(int argc, char* argv[]){
// Some stuff
while(TRUE)
{
std::vector<Image> images; // new vector each time round
for (int i = 0; i < 10; i++)
{
//Acquire Image
//Put image pointer in images vector named images
}
// std::thread::thread will forward this move
std::thread(thread_method, std::move(images)).detach(); // not join
}
// stuff
// this is somewhat of a lie now, we have to wait for the threads too
std::cout << std::endl << "Done! Press Enter to exit..." << std::endl;
std::getchar();
return 0;
}

c++ concurrency issue with scaling thread runtime

I have a program that performs the same function on a large array. I break the array into equal chunks and pass them to threads. Currently the threads perform the function and return what they are supposed to, BUT the more threads I add the longer each thread takes to run. Which totally negates the purpose of concurrency. I have tried with std::thread and std::async both with the same result. In the images below the amount of data processed by all child threads and the main thread are the same size (main has 6 more points), but what main runs in ~ 12 seconds the child threads take ~12 x the number of threads as if they were running asynchronously. But they all start at the same time, and if I output from each thread they are running concurrently. Does this have something to do with how they are being joined? I have tried everything I can think of, any help/advice is much appreciated! In the sample code main doesn't run the function until after child threads finish, if I put the join after the main runs it still doesn't run until the child threads finish. Below you can see the runtimes when run with 3 and 5 threads. These times are on a downscaled dataset for testing.
void foo(char* arg1, long arg2, std::promise<std::vector<std::vector<std::vector<std::vector<std::vector<long>>>>>> & ftrV) {
std::vector<std::vector<std::vector<std::vector<std::vector<long>>>>> Grid;
// does stuff....
// fills in "Grid"
ftrV.set_value(Grid);
}
int main(){
int thnmb = 3; // # of threads
std::vector<long> buffers; // fill in buffers
std::vector<char*> pointers; //fill in pointers
std::vector<std::promise<std::vector<std::vector<std::vector<std::vector<std::vector<long>>>>>>> PV(thnmb); // vector of promise grids
std::vector<std::future<std::vector<std::vector<std::vector<std::vector<std::vector<long>>>>>>> FV(thnmb); // vector of futures grids
std::vector<std::thread> th(thnmb); // vector of threads
std::vector<std::vector<std::vector<std::vector<std::vector<std::vector<long>>>>>> vt1(thnmb); // vector to store thread grids
for (int i = 0; i < thnmb; i++) {
th[i] = std::thread(&foo, pointers[i], buffers[i], std::ref(PV[i]));
}
for (int i = 0; i < thnmb; i++) {
FV[i] = PV[i].get_future();
}
for (int i = 0; i < thnmb; i++) {
vt1[i] = FV[i].get();
}
for (int i = 0; i < thnmb; i++) {
th[i].join();
}
// main performs same function as foo here
// combine data
// do other stuff..
return(0);
}
It's hard to give a definitive answer without knowing what foo does, but you're probably running into memory access issues. Each access to your 5 dimension array will require 5 memory lookups, and it only takes 2 or 3 threads with memory access to saturate what a typical system can deliver.
main should perform it's foo work after creating the threads but before getting the value of the promises.
And foo should probably end with ftrV.set_value(std::move(Grid)) so that a copy of that array won't have to be made.

Boost Thread_Group in a loop is very slow

I wanted to use threading to run check multiple images in a vector at the same time. Here is the code
boost::thread_group tGroup;
for (int line = 0;line < sourceImageData.size(); line++) {
for (int pixel = 0;pixel < sourceImageData[line].size();pixel++) {
for (int im = 0;im < m_images.size();im++) {
tGroup.create_thread(boost::bind(&ClassX::ClassXFunction, this, line, pixel, im));
}
tGroup.join_all();
}
}
This creates the thread group and loops thru lines of pixel data and each pixel and then multiple images. Its a weird project but anyway I bind the thread to a method in the same instance of the class this code is in so "this" is used. This runs through a population of about 20 images, binding each thread as it goes and then when it is done looping the join_all function takes effect when the threads are done. Then it goes to the next pixel and starts over again.
I'v tested running 50 threads at the same time with this simple program
void run(int index) {
for (int i = 0;i < 100;i++) {
std::cout << "Index : " <<index<<" "<<i << std::endl;
}
}
int main() {
boost::thread_group tGroup;
for (int i = 0;i < 50;i++){
tGroup.create_thread(boost::bind(run, i));
}
tGroup.join_all();
int done;
std::cin >> done;
return 0;
}
This works very quickly. Even though the method the threads are bound to in the previous program is more complicated it shouldn't be as slow as it is. It takes like 4 seconds for one loop of sourceImageData (line) to complete. I'm new to boost threading so I don't know if something is blatantly wrong with the nested loops or otherwise. Any insight is appreciated.
The answer is simple. Don't start that many threads. Consider starting as many threads as you have logical CPU cores. Starting threads is very expensive.
Certainly never start a thread just to do one tiny job. Keep the threads and give them lots of (small) tasks using a task queue.
See here for a good example where the number of threads was similarly the issue: boost thread throwing exception "thread_resource_error: resource temporarily unavailable"
In this case I'd think you can gain a lot of performance by increasing the size of each task (don't create one per pixel, but per scan-line for example)
I believe the difference here is in when you decide to join the threads.
In the first piece of code, you join the threads at every pixel of the supposed source image. In the second piece of code, you only join the threads once at the very end.
Thread synchronization is expensive and often a bottleneck for parallel programs because you are basically pausing execution of any new threads until ALL threads that need to be synchronized, which in this case is all the threads that are active, are done running.
If the iterations of the innermost loop(the one with im) are not dependent on each other, I would suggest you join the threads after the entire outermost loop is done.

Multithread performance drops down after a few operations

I encountered this weird bug in a c++ multithread program on linux. The multithreaded part basically executes a loop. One single iteration first loads a sift file containing some features. And then it queries these features against a tree. Since I have a lot of images, I used multiple threads to do this querying. Here is the code snippets.
struct MultiMatchParam
{
int thread_id;
float *scores;
double *scores_d;
int *perm;
size_t db_image_num;
std::vector<std::string> *query_filenames;
int start_id;
int num_query;
int dim;
VocabTree *tree;
FILE *file;
};
// multi-thread will do normalization anyway
void MultiMatch(MultiMatchParam &param)
{
// Clear scores
for(size_t t = param.start_id; t < param.start_id + param.num_query; t++)
{
for (size_t i = 0; i < param.db_image_num; i++)
param.scores[i] = 0.0;
DTYPE *keys;
int num_keys;
keys = ReadKeys_sfm((*param.query_filenames)[t].c_str(), param.dim, num_keys);
int normalize = true;
double mag = param.tree->MultiScoreQueryKeys(num_keys, normalize, keys, param.scores);
delete [] keys;
}
}
I run this on a 8-core cpu. At first it runs perfectly and the cpu usage is nearly 100% on all 8 cores. After each thread has queried several images (about 20 images), all of a sudden the performance (cpu usage) drops drastically, down to about 30% across all eight cores.
I doubt the key to this bug is concerned with this line of code.
double mag = param.tree->MultiScoreQueryKeys(num_keys, normalize, keys, param.scores);
Since if I replace it with another costly operations (e.g., a large for-loop containing sqrt). The cpu usage is always nearly 100%. This MultiScoreQueryKeys function does a complex operation on a tree. Since all eight cores may read the same tree (no write operation to this tree), I wonder whether the read operation has some kind of blocking effect. But it shouldn't have this effect because I don't have write operations in this function. Also the operations in the loop are basically the same. If it were to block the cpu usage, it would happen in the first few iterations. If you need to see the details of this function or other part of this project, please let me know.
Use std::async() instead of zeta::SimpleLock lock

c++ pthread limit number of thread

I tried to use pthread to do some task faster. I have thousands files (in args) to process and i want to create just a small number of thread many times.
Here's my code :
void callThread(){
int nbt = 0;
pthread_t *vp = (pthread_t*)malloc(sizeof(pthread_t)*NBTHREAD);
for(int i=0;i<args.size();i+=NBTHREAD){
for(int j=0;j<NBTHREAD;j++){
if(i+j<args.size()){
pthread_create(&vp[j],NULL,calcul,&args[i+j]);
nbt++;
}
}
for(int k=0;k<nbt;k++){
if(pthread_join(vp[k], NULL)){
cout<<"ERROR pthread_join()"<<endl;
}
}
}
}
It returns error, i don't know if it's a good way to solve my problem. All the resources are in args (vector of struct) and are independants.
Thanks for help.
You're better off making a thread pool with as many threads as the number of cores the cpu has. Then feed the tasks to this pool and let it do its job. You should take a look at this blog post right here for a great example of how to go about creating such thread pool.
A couple of tips that are not mentioned in that post:
Use std::thread::hardware_concurrency() to get the number of cores.
Figure out a way how to store the tasks, hint: std::packaged_task or something along
those lines wrapped in a class so you can track things such as when a task is done, or implement task.join().
Also, github with the code of his implementation plus some extra stuff such as std::future support can be found here.
You can use a semaphore to limit the number of parallel threads, here is a pseudo code:
Semaphore S = MAX_THREADS_AT_A_TIME // Initial semaphore value
declare handle_array[NUM_ITERS];
for(i=0 to NUM_ITERS)
{
wait-while(S<=0);
Acquire-Semaphore; // S--
handle_array[i] = Run-Thread(MyThread);
}
for(i=0 to NUM_ITERS)
{
Join_thread(handle_array[i])
Close_handle(handle_array[i])
}
MyThread()
{
mutex.lock
critical-section
mutex.unlock
release-semaphore // S++
}