Using boost::thread in a planet simulator - c++

I am making a simulator of planets in space, and the issue is, that I cannot simulate more than ~100 planets, because the simulation slows down exponentially. To solve this problem, I thought using threads could solve my problem, since I am probably not experienced enaugh to use graphic card processors to calculate.
I have 2 functions in my program wich are used to calculate the gravitational force between planets and another one to check for collisions. I implemented threads in a way, so I calculate gravitational forces in one thread and collisions in another.
The problem is that the simulation isn't running any faster then without using threads. Maybe I'm implementing them wrong?
int main()
{
int numOfPlanets;
cout << "Enter the maximum number of planets to generate: ";
cin >> numOfPlanets;
App.Create(sf::VideoMode(1366, 740), "SFML Galaxy Simulator");
App.Clear(sf::Color(20,20,20));
generateRandomPlanets(500, 500, numOfPlanets);
//createPlanet(planets, sf::Vector2f(500,500), sf::Vector2f(0,0), 5, 500);
thread thread_1;
thread thread_2;
while(App.IsOpened())
{
sf::Event Event;
while (App.GetEvent(Event))
{
if (Event.Type == sf::Event::Closed)
App.Close();
}
App.Clear(sf::Color(20,20,20));
thread_1 = thread(checkCollision);
thread_2 = thread(calculateForce);
thread_1.join();
thread_2.join();
updatePlanets();
App.Display();
}
thread_2.join();
thread_1.join();
return 0;
}

thread_1 = thread(checkCollision);
thread_2 = thread(calculateForce);
thread_1.join();
thread_2.join();
updatePlanets();
This launches two new threads to do some work in parallel, then blocks waiting for them to finish, then afterwards runs updatePlanets. You probably want:
thread_1 = thread(checkCollision);
thread_2 = thread(calculateForce);
updatePlanets();
thread_1.join();
thread_2.join();
This will run the three functions in parallel.
Also, this is an error at the end of main:
thread_2.join();
thread_1.join();
return 0;
You've already joined the threads, you can't join them again.
There's actually no point in declaring thread_1 and thread_2 outside the loop and reusing them, you could just declare them in the loop:
thread thread_1(checkCollision);
thread thread_2(calculateForce);
updatePlanets();
thread_1.join();
thread_2.join();
Also be aware that if updatePlanets throws an exception it will terminate your program because the thread_2 destructor will run while the thread is joinable and so call terminate(), which may be OK in this program, but is something to bear in mind.

I have made some more research after posting the question and found out that the main problem in the performance of my simulation was the time complexity of the algorithm for calculating gravitational forces of each planet relative to all other planets which was O(n^2).
I found out, that one or maybe the best method to approach this is by using Barnes-Hut algorithm for n-body simulation with it's time complexity of O(n log n). The way this algorithm works is it divides all the planets into quadtree nodes and then calculates forces acording to the center-mass of each node.
So to add all this together, using Barnes-Hut algorithm together with threading is the best way to approach this problem.

Related

Is there a way to kill and clean up a c++ program if it deadlocks/waits forever

So I built a program that should be released to production soon, but I'm worried if I run into a situation where all threads lock/wait, that the pipeline will be compromised. I am pretty sure I designed it so this won't happen, but if it were to, I'd like to kill all the threads and produce a boilerplate output. My first assumption was to simply code a thread to monitor the iterations of all the other threads, killing them if no iteration occurs for 5 seconds, but this doesn't seem to work, and also there's the problem that all the threads are in some random state of execution:
void deadlock_monitor() {
while(true) {
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
int64_t time_diff = gnut_GetMicroTime() - last_thread_iter;
if(((time_diff/1000) > 5000) && !processing_completed) {
exit(1);
}
if(processing_completed) {
return;
}
}
return;
}
Is there a best practice to deal with this, or is ensuring there are no race conditions all I can do?

Let threads execute untill a specific point, wait for the rest to get there, execute the remaining code sequentially

I have some code that needs to benchmark multiple algorithms. But before they can be benchmarked they need to get prepared. I would like to do this preparation multi-threaded, but the benchmarking needs to be sequential. Normally I would make threads for the preparation, wait for them to finish with join and let the benchmarking be done in the main thread. However the preparation and benchmarking are done in a seperate process after a fork because sometimes the preparation and the benchmarking may take too long. (So there is also a timer process made by a fork which kills the other process after x seconds.) And the preparation and benchmarking have to be done in the same process otherwise the benchmarking does not work. So I was wondering if I make a thread for every algorithm if there is a way to let them run concurrently until a certain point, then let them all wait untill the others reach that point and then let them do the rest of the work sequentially.
Here is the code that would be executed in a thread:
void prepareAndBenchmark(algorithm) {
//The timer thread that stops the worker after x seconds
pid_t timeout_pid = fork();
if (timeout_pid == 0) {
sleep(x);
_exit(0);
}
//The actual work
pid_t worker_pid = fork();
if (worker_pid == 0) {
//Concurrently:
prepare(algorithm)
//Concurrently up until this point
//At this point all the threads should run sequentially one after the other:
double result = benchmark(algorithm)
exit(0);
}
int status;
pid_t exited_pid = wait(&status);
if (exited_pid == worker_pid) {
kill(timeout_pid, SIGKILL);
if(status == 0) {
//I would use pipes to get the result of the benchmark.
} else {
//Something went wrong
}
} else {
//It took too long.
kill(worker_pid, SIGKILL);
}
wait(NULL);
}
I have also read that forking in threads migth give problems, would it be a problem in this code?
I think I could use mutex to have only one thread benchmarking at a time, but I don't want to have a thread benchmarking while others are still preparing.

Incomplete multi-threading RayTracer taking twice as much time as expected

I am making a MT Ray-Tracer multithreading, and as the title says, its taking twice as much to execute as the single thread version. Obviously the purpose is to cut the render time by the half, however what I am doing now is just to send the ray-tracing method to run twice, one for each thread, basically executing the same rendering twice. Nonetheless, as threads can run in parallel, shall not there be a meaningful increase in execution time. But is about doubling.
This has to be related to my multithreading setup. I think its related to the fact I create them as joinable. So I am going to explain what I am doing and also put the related code to see if someone can confirm if that's the issue.
I create two threads and set them as joinable so. Create a RayTracer that allocates enough memory to store the image pixels (this is done in the constructor). Run a two iterations loop for sending relevant info for each thread, like the thread id and the adress of the Raytracer instance.
Then pthread_create calls run_thread, whose purpose is to call the ray_tracer:draw method where the work is done. On the draw method, I have a
pthread_exit (NULL);
as the last thing on it (the only MT thing on it). Then do another loop to join the threads. Finally I star to write the file in a small loop. Finally close the file and delete the pointers related to the array used to store the image in the draw method.
I may not need to use to join now that I am not doing a "real" multithreading ray-tracer, just rendering it twice, but as soon as I start alternate between the image pixels (ie, thread0 -> renders pixel0 - thread0 -> stores pixel0, thread1 -> renders pixel1 - thread1 -> stores pixel1, thread0 -> renders pixel2 - thread0 -> stores pixel2, , thread1 -> renders pixel3 - thread1 -> stores pixel3,etc...) I think I will need it so to be able to write the pixels in correct order on a file.
Is that correct? Do I really need to use join here with my method (or with any other?). If I do, how can I send the threads to run concurrently, not waiting for the other to complete? Is the problem totally unrelated to join?
pthread_t threads [2];
thread_data td_array [2];
pthread_attr_t attr;
void *status;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
TGAManager tgaManager ("z.tga",true);
if (tgaManager.isFileOpen()) {
tgaManager.writeHeadersData (image);
RayTracer rt (image.getHeight() * image.getWidth());
int rc;
for (int i=0; i<2; i++) {
//cout << "main() : creating thread, " << i << endl;
td_array[i].thread_id=i;
td_array[i].rt_ptr = &rt;
td_array[i].img_ptr = ℑ
td_array[i].scene_ptr = &scene;
//cout << "td_array.thread_index: " << td_array[i].thread_id << endl;
rc = pthread_create (&threads[i], NULL, RayTracer::run_thread, &td_array[i]);
}
if (rc) {
cout << "Error:unable to create thread," << rc << endl;
exit(-1);
}
pthread_attr_destroy(&attr);
for (int i=0; i<2; i++ ) {
rc = pthread_join(threads[i], &status);
if (rc) {
cout << "Error:unable to join," << rc << endl;
exit(-1);
}
}
//tgaManager.writeImage (rt,image.getSize());
for (int i=0; i<image.getWidth() * image.getHeight(); i++) {
cout << i << endl;
tgaManager.file_writer.put (rt.b[i]);
tgaManager.file_writer.put (rt.c[i]);
tgaManager.file_writer.put (rt.d[i]);
}
tgaManager.closeFile(1);
rt.deleteImgPtr ();
}
You do want to join() the threads, because if you don't, you have several problems:
How do you know when the threads have finished executing? You don't want to start writing out the resulting image only to find that it wasn't fully calculated at the moment you wrote it out.
How do you know when it is safe to tear down any data structures that the threads might be accessing? For example, your RayTracer object is on the stack, and (AFAICT) your threads are writing into its pixel-array. If your main function returns before the the threads have exited, there is a very good chance that the threads will sometimes end up writing into a RayTracer object that no longer exists, which will corrupt the stack by overwriting whatever other objects might exist (by happenstance) at those same locations after your function returned.
So you definitely need to join() your threads; you don't need to explicitly declare them as PTHREAD_CREATE_JOINABLE, though, since that attribute is already set by default anyway.
Joining the threads should not cause the threads to slow down, as long as both threads are created and running before you call join() on any of them (which appears to be the case in your posted code).
As for why you are seeing a slowdown with two threads, that's hard to say since a slowdown could be coming from a number of places. Some possibilities:
Something in your ray-tracing code is locking a mutex, such that for much of the ray-tracing run, only one of the two threads is allowed to execute at a time anyway.
Both threads are writing to the same memory locations at around the same time, and that is causing cache-contention which slows down the execution of both threads.
My suggestion would be to set your threads so that thread #1 renders only the top half of the image, and thread #2 renders only the bottom half of the image; that way when they write their output they will be writing to different sections of memory.
If that doesn't help, you might temporarily replace the rendering code with something simpler (e.g. a "renderer" that just sets pixels to random values) to see if you can see a speedup with that. If so, then there might be something in your RayTracer's implementation that isn't multithreading-friendly.

Boost Thread_Group in a loop is very slow

I wanted to use threading to run check multiple images in a vector at the same time. Here is the code
boost::thread_group tGroup;
for (int line = 0;line < sourceImageData.size(); line++) {
for (int pixel = 0;pixel < sourceImageData[line].size();pixel++) {
for (int im = 0;im < m_images.size();im++) {
tGroup.create_thread(boost::bind(&ClassX::ClassXFunction, this, line, pixel, im));
}
tGroup.join_all();
}
}
This creates the thread group and loops thru lines of pixel data and each pixel and then multiple images. Its a weird project but anyway I bind the thread to a method in the same instance of the class this code is in so "this" is used. This runs through a population of about 20 images, binding each thread as it goes and then when it is done looping the join_all function takes effect when the threads are done. Then it goes to the next pixel and starts over again.
I'v tested running 50 threads at the same time with this simple program
void run(int index) {
for (int i = 0;i < 100;i++) {
std::cout << "Index : " <<index<<" "<<i << std::endl;
}
}
int main() {
boost::thread_group tGroup;
for (int i = 0;i < 50;i++){
tGroup.create_thread(boost::bind(run, i));
}
tGroup.join_all();
int done;
std::cin >> done;
return 0;
}
This works very quickly. Even though the method the threads are bound to in the previous program is more complicated it shouldn't be as slow as it is. It takes like 4 seconds for one loop of sourceImageData (line) to complete. I'm new to boost threading so I don't know if something is blatantly wrong with the nested loops or otherwise. Any insight is appreciated.
The answer is simple. Don't start that many threads. Consider starting as many threads as you have logical CPU cores. Starting threads is very expensive.
Certainly never start a thread just to do one tiny job. Keep the threads and give them lots of (small) tasks using a task queue.
See here for a good example where the number of threads was similarly the issue: boost thread throwing exception "thread_resource_error: resource temporarily unavailable"
In this case I'd think you can gain a lot of performance by increasing the size of each task (don't create one per pixel, but per scan-line for example)
I believe the difference here is in when you decide to join the threads.
In the first piece of code, you join the threads at every pixel of the supposed source image. In the second piece of code, you only join the threads once at the very end.
Thread synchronization is expensive and often a bottleneck for parallel programs because you are basically pausing execution of any new threads until ALL threads that need to be synchronized, which in this case is all the threads that are active, are done running.
If the iterations of the innermost loop(the one with im) are not dependent on each other, I would suggest you join the threads after the entire outermost loop is done.

Pausing in OpenGL successively

void keyPress(unsigned char key,int x,int y){
int i;
switch(key){
case 'f':
i = 3;
while(i--){
x_pos += 3;
sleep(100);
glutPostRedisplay();
}
}
}
Above is the code snippet written in C++ using GLUT library in Windows 7.
This function takes a character key and mouse co-ordinates x,y and performs translation along x-direction in 3 successive steps on pressing f character. Between each step the program should sleep for 100 ms.
We want to move a robot, and pause successively when he moves forward steps.
We are facing a problem in making the program sleep between the 3 steps. What is the problem in the above code snippet?
Disclaimer: The answer of jozxyqk seems better to me. This answer solves the problem in a dirty way.
You are misusing glutPostRedisplay, as is stated in this answer. The problem being, that glutPostRedisplay marks the current window as needing to be redisplayed, but it will only be done once you get in the glutMainLoop again. That does happen only once, hence only one sleep seems to work.
In fact all three sleeps work, but you get only one redraw after 300 ms.
To solve this, you have to find another way of redrawing the scene.
while(i--){
x_pos += 3;
sleep(100);
yourDrawFunction();
}
Assuming that you are working on a UNIX system.
sleep for 100 ms
sleep(100);
The problem here is, that you are sleeping for 100 seconds, as you are probably using the sleep function of the <unistd.h> header, which defines sleep() as:
extern unsigned int sleep (unsigned int __seconds);
What you want is probably something like
usleep(100000); //sleeps for 100000 microseconds == 100 ms
I believe the issue with your code is your sleep is messing with glut's main loop. The call stack might look something like this
main() -> glutMainLoop() -> keyPress() -> sleep()
#but can't get to this...
main() -> glutMainLoop() -> display()
Until keyPress() returns, glut's main loop cannot continue to render the next frame. It's waiting for the function to return. All glutPostRedisplay() does is say "hey, something's changed so the image is stale and we need to redraw the next time the main loop iterates". It doesn't actually call display().
You'll have to structure your code such that the main loop can continue as normal, but still include a delay between drawing. For example:
In keyPress(), set a moving = true state. Let the function return.
In the idle() function, call sleep() if moving or maybe if you have moved last time (really you might want to look into calculating elapsed time and do the timing yourself so you don't block the entire program)
Again in idle() increase x_pos and decrease your move count, let the function return, glut will draw, then call idle again and you can sleep/move again.