Incomplete multi-threading RayTracer taking twice as much time as expected

Incomplete multi-threading RayTracer taking twice as much time as expected - c++

I am making a MT Ray-Tracer multithreading, and as the title says, its taking twice as much to execute as the single thread version. Obviously the purpose is to cut the render time by the half, however what I am doing now is just to send the ray-tracing method to run twice, one for each thread, basically executing the same rendering twice. Nonetheless, as threads can run in parallel, shall not there be a meaningful increase in execution time. But is about doubling.
This has to be related to my multithreading setup. I think its related to the fact I create them as joinable. So I am going to explain what I am doing and also put the related code to see if someone can confirm if that's the issue.
I create two threads and set them as joinable so. Create a RayTracer that allocates enough memory to store the image pixels (this is done in the constructor). Run a two iterations loop for sending relevant info for each thread, like the thread id and the adress of the Raytracer instance.
Then pthread_create calls run_thread, whose purpose is to call the ray_tracer:draw method where the work is done. On the draw method, I have a
pthread_exit (NULL);
as the last thing on it (the only MT thing on it). Then do another loop to join the threads. Finally I star to write the file in a small loop. Finally close the file and delete the pointers related to the array used to store the image in the draw method.
I may not need to use to join now that I am not doing a "real" multithreading ray-tracer, just rendering it twice, but as soon as I start alternate between the image pixels (ie, thread0 -> renders pixel0 - thread0 -> stores pixel0, thread1 -> renders pixel1 - thread1 -> stores pixel1, thread0 -> renders pixel2 - thread0 -> stores pixel2, , thread1 -> renders pixel3 - thread1 -> stores pixel3,etc...) I think I will need it so to be able to write the pixels in correct order on a file.
Is that correct? Do I really need to use join here with my method (or with any other?). If I do, how can I send the threads to run concurrently, not waiting for the other to complete? Is the problem totally unrelated to join?
pthread_t threads [2];
thread_data td_array [2];
pthread_attr_t attr;
void *status;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
TGAManager tgaManager ("z.tga",true);
if (tgaManager.isFileOpen()) {
tgaManager.writeHeadersData (image);
RayTracer rt (image.getHeight() * image.getWidth());
int rc;
for (int i=0; i<2; i++) {
//cout << "main() : creating thread, " << i << endl;
td_array[i].thread_id=i;
td_array[i].rt_ptr = &rt;
td_array[i].img_ptr = ℑ
td_array[i].scene_ptr = &scene;
//cout << "td_array.thread_index: " << td_array[i].thread_id << endl;
rc = pthread_create (&threads[i], NULL, RayTracer::run_thread, &td_array[i]);
}
if (rc) {
cout << "Error:unable to create thread," << rc << endl;
exit(-1);
}
pthread_attr_destroy(&attr);
for (int i=0; i<2; i++ ) {
rc = pthread_join(threads[i], &status);
if (rc) {
cout << "Error:unable to join," << rc << endl;
exit(-1);
}
}
//tgaManager.writeImage (rt,image.getSize());
for (int i=0; i<image.getWidth() * image.getHeight(); i++) {
cout << i << endl;
tgaManager.file_writer.put (rt.b[i]);
tgaManager.file_writer.put (rt.c[i]);
tgaManager.file_writer.put (rt.d[i]);
}
tgaManager.closeFile(1);
rt.deleteImgPtr ();
}

You do want to join() the threads, because if you don't, you have several problems:
How do you know when the threads have finished executing? You don't want to start writing out the resulting image only to find that it wasn't fully calculated at the moment you wrote it out.
How do you know when it is safe to tear down any data structures that the threads might be accessing? For example, your RayTracer object is on the stack, and (AFAICT) your threads are writing into its pixel-array. If your main function returns before the the threads have exited, there is a very good chance that the threads will sometimes end up writing into a RayTracer object that no longer exists, which will corrupt the stack by overwriting whatever other objects might exist (by happenstance) at those same locations after your function returned.
So you definitely need to join() your threads; you don't need to explicitly declare them as PTHREAD_CREATE_JOINABLE, though, since that attribute is already set by default anyway.
Joining the threads should not cause the threads to slow down, as long as both threads are created and running before you call join() on any of them (which appears to be the case in your posted code).
As for why you are seeing a slowdown with two threads, that's hard to say since a slowdown could be coming from a number of places. Some possibilities:
Something in your ray-tracing code is locking a mutex, such that for much of the ray-tracing run, only one of the two threads is allowed to execute at a time anyway.
Both threads are writing to the same memory locations at around the same time, and that is causing cache-contention which slows down the execution of both threads.
My suggestion would be to set your threads so that thread #1 renders only the top half of the image, and thread #2 renders only the bottom half of the image; that way when they write their output they will be writing to different sections of memory.
If that doesn't help, you might temporarily replace the rendering code with something simpler (e.g. a "renderer" that just sets pixels to random values) to see if you can see a speedup with that. If so, then there might be something in your RayTracer's implementation that isn't multithreading-friendly.

Related

How to stop one thread from two parallel threads?

I have a program in which we can monitor 2 objects at same time.
myThread = new thread (thred1, id);
vec.push_back (myThread);
In thred1 function,i use Boolean function to read the stored values from a different vector and it runs parallely like this:
element found 2 -- hj
HUMIDITY-1681692777 DISPLAYED IN RH
element found 1 -- hj
TEMPERATURE--1714636915 IN DEGREE CELSIUS
This keeps on running as that is what my program should do.
I have a case where I need to get ID from the user and stop that particular thread and the other should keep running till I stop it.Can someone help me with that?
void thred1 (int id)
{
bool err = false;
while (stopThread == false)
{
for (size_t i = 0; i < v.size (); i++)
{
if (id == v[i]->id)
{
cout << "element found " << v[i]->id << " -- " << v[i]->name << endl;
v[i]->Read ();
this_thread::sleep_for (chrono::seconds (4));
err = true;
break;
}
}
if (!err)
{
cout << "element not found" << endl;
break;
}
}
}

Suspension
1. Assuming you want to suspend the monitor thread but only temporarily (i.e making any changes) then you can just use a mutex. Lock it before accessing the shared vector and unlock it when you're done, ensuring that only one thread can access the data at a time.
2. You can actively suspend the thread using OS support such as SuspendThread and ResumeThread, in the case of Windows, when it's ready.
Termination
1. You could use an event for each monitor thread, name being linked to the ID would work. At each iteration of the monitor check for the termination event, ending the thread if it's active.
2. Pass some variable to each thread, store them in a map with the thread handle being the key, and similar to the previous option just check the value for each iteration.
3. Store all threads in a map with the handle as key, terminating it directly with OS support.
Honestly there are a ton of ways to do this, the best implementation depends on why exactly you want to stop the monitor thread. Any sort of synchronization object like a mutex should be fine if you're reading from one thread and writing from another. Otherwise, just storing all threads with the internal ID as key and the thread as the value should be fine for terminating monitor threads on demand.

Synchronising main thread and worker thread

In QT, from main(GUI) thread I am creating a worker thread to perform a certain operation which accesses a resource shared by both threads. On certain action in GUI, main thread has to manipulate the resource. I tried using QMutex to lock that particular resource. This resource is continuously used by the worker thread, How to notify main thread on this?
Tried using QWaitCondition but it was crashing the application.
Is there any other option to notify and achieve synchronisation between threads?
Attached the code snippet.
void WorkerThread::IncrementCounter()
{
qDebug() << "In Worker Thread IncrementCounter function" << endl;
while(stop == false)
{
mutex.lock();
for(int i = 0; i < 100; i++)
{
for(int j = 0; j < 100; j++)
{
counter++;
}
}
qDebug() << counter;
mutex.unlock();
}
qDebug() << "In Worker Thread Aborting " << endl;
}
//Manipulating the counter value by main thread.
void WorkerThread::setCounter(int value)
{
waitCondition.wait(&mutex);
counter = value;
waitCondition.notify_one();
}

You are using the wait condition completely wrong.
I urge you to read up on mutexes and conditions, and maybe look at some examples.
wait() will block execution until either notify_one() or notify_all() is called somewhere. Which of course cannot happen in your code.
You cannot wait() a condition on one line and then expect the next two lines to ever be called if they contain the only wake up calls.
What you want is to wait() in one thread and notify_XXX() in another.

You could use shared memory from within the same process. Each thread could lock it before writing it, like this:
QSharedMemory *shared=new QSharedMemory("Test Shared Memory");
if(shared->create(1,QSharedMemory::ReadWrite))
{
shared->lock();
// Copy some data to it
char *to = (char*)shared->data();
const char *from = &dataBuffer;
memcpy(to, from, dataSize);
shared->unlock();
}
You should also lock it for reading. If strings are wanted, reading strings can be easier that writing them, if they are zero terminated. You'll want to convert .toLatin1() to get a zero-terminated string which you can get the size of a string. You might get a lock that multiple threads can read from, with shared->attach(); but that's more for reading the shared memory of a different process..
You might just use this instead of muteces. I think if you try to lock it, and something else already has it locked, it will just block until the other process unlocks it.

"Project.exe has triggered a breakpoint" after implementing multithreading

I have a Visual Studio project that worked fine until I tried to implement multithreading. The project acquires images from a GigE camera, and after acquiring 10 images, a video is made from the acquired images.
The program flow was such that the program didn't acquire images when it was making the video. I wanted to change this, so I created another thread that makes the videos from the images. What I wanted is that the program will acquire images continuously, after 10 images are acquired, another thread runs in parallel that will make the video. This will continue until I stop the program, 10 images are acquired, video from these 10 images is made in parallel while the next 10 images are acquired and so on.
I haven't created threads before so I followed the tutorial on this website. Similar to the website, I created a thread for the function that saves the video. The function that creates the video takes the 10 images as a vector argument. I execute join on this thread just before the line where my main function terminates.
For clarity, here's pseudo-code for what I've implemented:
#include ...
#include <thread>
using namespace std;
thread threads[1];
vector<Image> images;
void thread_method(vector<Image> & images){
// Save vector of images to video
// Clear the vector of images
}
int main(int argc, char* argv[]){
// Some stuff
while(TRUE)
{
for (int i = 0; i < 10; i++)
{
//Acquire Image
//Put image pointer in images vector named images
}
threads[0] = thread(thread_method, images)
}
// stuff
threads[0].join();
cout << endl << "Done! Press Enter to exit..." << endl;
getchar();
return 0;
}
When I run the project now, a message pops up saying that the Project.exe has triggered a breakpoint. The project breaks in report_runtime_error.cpp in static bool __cdecl issue_debug_notification(wchar_t const* const message) throw().
I'm printing some cout messages on the console to help me understand what's going on. What happens is that the program acquires 10 images, then the thread for saving the video starts running. As there are 10 images, 10 images have to be appended to the video. The message that says Project.exe has triggered a breakpoint pops up after the second time 10 images are acquired, at this point the parallel thread has only appended 6 images from the first acquired set of images to the video.
The output contains multiple lines of thread XXXX has exited with code 0, after that the output says
Debug Error!
Program: ...Path\Program.exe
abort() has been called
(Press Retry to debug the application)
Program.exe has triggered a breakpoint.

I can't explain all this in a comment. I'm dropping this here because it looks like OP is heading in some bad directions and I'd like to head him off before the cliff. Caleth has caught the big bang and provided a solution for avoiding it, but that bang is only a symptom of OP's and the solution with detach is somewhat questionable.
using namespace std;
Why is "using namespace std" considered bad practice?
thread threads[1];
An array 1 is pretty much pointless. If we don't know how many threads there will be, use a vector. Plus there is no good reason for this to be a global variable.
vector<Image> images;
Again, no good reason for this to be global and many many reasons for it NOT to be.
void thread_method(vector<Image> & images){
Pass by reference can save you some copying, but A) you can't copy a reference and threads copy the parameters. OK, so use a pointer or std::ref. You can copy those. But you generally don't want to. Problem 1: Multiple threads using the same data? Better be read only or protected from concurrent modification. This includes the thread generating the vector. 2. Is the reference still valid?
// Save vector of images to video
// Clear the vector of images
}
int main(int argc, char* argv[]){
// Some stuff
while(TRUE)
{
for (int i = 0; i < 10; i++)
{
//Acquire Image
//Put image pointer in images vector named images
}
threads[0] = thread(thread_method, images)
Bad for reasons Caleth covered. Plus images keeps growing. The first thread, even if copied, has ten elements. The second has the first ten plus another ten. This is weird, and probably not what OP wants. References or pointers to this vector are fatal. The vector would be resized while other threads were using it, invalidating the old datastore and making it impossible to safely iterate.
}
// stuff
threads[0].join();
Wrong for reasons covered by Caleth
cout << endl << "Done! Press Enter to exit..." << endl;
getchar();
return 0;
}
The solution to joining on the threads is the same as just about every Stack Overflow question that doesn't resolve to "Use a std::string": Use a std::vector.
#include <iostream>
#include <vector>
#include <thread>
void thread_method(std::vector<int> images){
std::cout << images[0] << '\n'; // output something so we can see work being done.
// we may or may not see all of the numbers in order depending on how
// the threads are scheduled.
}
int main() // not using arguments, leave them out.
{
int count = 0; // just to have something to show
// no need for threads to be global.
std::vector<std::thread> threads; // using vector so we can store multiple threads
// Some stuff
while(count < 10) // better-defined terminating condition for testing purposes
{
// every thread gets its own vector. No chance of collisions or duplicated
// effort
std::vector<int> images; // don't have Image, so stubbing it out with int
for (int i = 0; i < 10; i++)
{
images.push_back(count);
}
// create and store thread.
threads.emplace_back(thread_method,std::move(images));
count ++;
}
// stuff
for (std::thread &temp: threads) // wait for all threads to exit
{
temp.join();
}
// std::endl is expensive. It's a linefeed and s stream flush, so save it for
// when you really need to get a message out immediately
std::cout << "\nDone! Press Enter to exit..." << std::endl;
char temp;
std::cin >>temp; // sticking with standard librarly all the way through
return 0;
}
To better explain
threads.emplace_back(thread_method,std::move(images));
this created a thread inside threads (emplace_back) that will call thread_method with a copy of images. Odds are good that the compiler would have recognized that this was the end of the road for this particular instance of images and eliminated the copying, but if not, std::move should give it the hint.

You are overwriting your one thread in the while loop. If it's still running, the program is aborted. You have to join or detach each thread value.
You could instead do
#include // ...
#include <thread>
// pass by value, as it's potentially outliving the loop
void thread_method(std::vector<Image> images){
// Save vector of images to video
}
int main(int argc, char* argv[]){
// Some stuff
while(TRUE)
{
std::vector<Image> images; // new vector each time round
for (int i = 0; i < 10; i++)
{
//Acquire Image
//Put image pointer in images vector named images
}
// std::thread::thread will forward this move
std::thread(thread_method, std::move(images)).detach(); // not join
}
// stuff
// this is somewhat of a lie now, we have to wait for the threads too
std::cout << std::endl << "Done! Press Enter to exit..." << std::endl;
std::getchar();
return 0;
}

Boost Thread_Group in a loop is very slow

I wanted to use threading to run check multiple images in a vector at the same time. Here is the code
boost::thread_group tGroup;
for (int line = 0;line < sourceImageData.size(); line++) {
for (int pixel = 0;pixel < sourceImageData[line].size();pixel++) {
for (int im = 0;im < m_images.size();im++) {
tGroup.create_thread(boost::bind(&ClassX::ClassXFunction, this, line, pixel, im));
}
tGroup.join_all();
}
}
This creates the thread group and loops thru lines of pixel data and each pixel and then multiple images. Its a weird project but anyway I bind the thread to a method in the same instance of the class this code is in so "this" is used. This runs through a population of about 20 images, binding each thread as it goes and then when it is done looping the join_all function takes effect when the threads are done. Then it goes to the next pixel and starts over again.
I'v tested running 50 threads at the same time with this simple program
void run(int index) {
for (int i = 0;i < 100;i++) {
std::cout << "Index : " <<index<<" "<<i << std::endl;
}
}
int main() {
boost::thread_group tGroup;
for (int i = 0;i < 50;i++){
tGroup.create_thread(boost::bind(run, i));
}
tGroup.join_all();
int done;
std::cin >> done;
return 0;
}
This works very quickly. Even though the method the threads are bound to in the previous program is more complicated it shouldn't be as slow as it is. It takes like 4 seconds for one loop of sourceImageData (line) to complete. I'm new to boost threading so I don't know if something is blatantly wrong with the nested loops or otherwise. Any insight is appreciated.

The answer is simple. Don't start that many threads. Consider starting as many threads as you have logical CPU cores. Starting threads is very expensive.
Certainly never start a thread just to do one tiny job. Keep the threads and give them lots of (small) tasks using a task queue.
See here for a good example where the number of threads was similarly the issue: boost thread throwing exception "thread_resource_error: resource temporarily unavailable"
In this case I'd think you can gain a lot of performance by increasing the size of each task (don't create one per pixel, but per scan-line for example)

I believe the difference here is in when you decide to join the threads.
In the first piece of code, you join the threads at every pixel of the supposed source image. In the second piece of code, you only join the threads once at the very end.
Thread synchronization is expensive and often a bottleneck for parallel programs because you are basically pausing execution of any new threads until ALL threads that need to be synchronized, which in this case is all the threads that are active, are done running.
If the iterations of the innermost loop(the one with im) are not dependent on each other, I would suggest you join the threads after the entire outermost loop is done.

Seg faults with pthreads_mutex

I am implementing a particle interaction simulator in pthreads,and I keep getting segmentation faults in my pthreads code. The fault occurs in the following loop, which each thread does in the end of each timestep in my thread_routine:
for (int i = first; i < last; i++)
{
get_id(particles[i], box_id);
pthread_mutex_lock(&locks[box_id.x + box_no * box_id.y]);
//cout << box_id.x << "," << box_id.y << "," << thread_id << "l" << endl;
box[box_id.x][box_id.y].push_back(&particles[i]);
//cout << box_id.x << box_id.y << endl;
pthread_mutex_unlock(&locks[box_id.x + box_no * box_id.y]);
}
The strange thing is that if I uncomment one (it doesn't matter which one) or both of the couts, the program runs as expected, with no errors occurring (but this obviously kills performance, and isn't an elegant solution), giving correct output.
box is a globally declared
vector < vector < vector < particle_t*> > > box
which represents a decomposition of my (square) domain into boxes.
When the loop starts, box[i][j].size() has been set to zero for all i, j, and the loop is supposed to put particles back into the box-structure (the get_id function gives correct results, I've checked)
The array pthread_mutex_t locks is declared as a global
pthread_mutex_t * locks,
and the size is set by thread 0 and the locks initialized by thread 0 before the other threads are created:
locks = (pthread_mutex_t *) malloc( box_no*box_no * sizeof( pthread_mutex_t ) );
for (int i = 0; i < box_no*box_no; i++)
{
pthread_mutex_init(&locks[i],NULL);
}
Do you have any idea of what could cause this? The code also runs if the number of processors is set to 1, and it seems like the more processors I run on, the earlier the seg fault occurs (it has run through the entire simulation once on two processors, but this seems to be an exception)
Thanks

This is only an educated guess, but based on the problem going away if you use one lock for all the boxes: push_back has to allocate memory, which it does via the std::allocator template. I don't think allocator is guaranteed to be thread-safe and I don't think it's guaranteed to be partitioned, one for each vector, either. (The underlying operator new is thread-safe, but allocator usually does block-slicing tricks to amortize operator new's cost.)
Is it practical for you to use reserve to preallocate space for all your vectors ahead of time, using some conservative estimate of how many particles are going to wind up in each box? That's the first thing I'd try.
The other thing I'd try is using one lock for all the boxes, which we know works, but moving the lock/unlock operations outside the for loop so that each thread gets to stash all its items at once. That might actually be faster than what you're trying to do -- less lock thrashing.

Are the box and box[i] vectors initialized properly? You only say the innermost set of vectors are set. Otherwise it looks like box_id's x or y component is wrong and running off the end of one of your arrays.
What part of the look is it crashing on?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Incomplete multi-threading RayTracer taking twice as much time as expected - c++

Related

How to stop one thread from two parallel threads?

Synchronising main thread and worker thread

"Project.exe has triggered a breakpoint" after implementing multithreading

Boost Thread_Group in a loop is very slow

Seg faults with pthreads_mutex

Categories

Resources