Why is my file-loading thread not parallelized with the main thread?

Why is my file-loading thread not parallelized with the main thread? - c++

My program does file loading and memcpy'ing in the background while the screen is meant to be updated interactively. The idea is to have async loading of files the program will soon need so that they are ready to be used when the main thread needs them. However, the loads/copies don't seem to happen in parallel with the main thread. The main thread pauses during the loading and will often wait for all loads (can be up to 8 at once) to finish before the next iteration of the main thread's main loop.
I'm using Win32, so I'm using _beginthread for creating the file-loading/copying thread.
The worker thread function:
void fileLoadThreadFunc(void *arglist)
{
while(true)
{
// s_mutex keeps the list from being updated by the main thread
s_mutex.lock(); // uses WaitForSingleObject INFINITE
// s_filesToLoad is a list added to from the main thread
while (s_filesToLoad.size() == 0)
{
s_mutex.unlock();
Sleep(10);
s_mutex.lock();
}
loadObj *obj = s_filesToLoad[0];
s_filesToLoad.erase(s_filesToLoad.begin());
s_mutex.unlock();
obj->loadFileAndMemcpy();
}
}
main thread startup:
_beginThread(fileLoadThreadFunc, 0, NULL);
code in a class that the main thread uses to "kick" the thread for loading a file:
// I used the commented code to see if main thread was ever blocking
// but the PRINT never printed, so it looks like it never was waiting on the worker
//while(!s_mutex.lock(false))
//{
// PRINT(L"blocked! ");
//}
s_mutex.lock();
s_filesToLoad.push_back(this);
s_mutex.unlock();
Some more notes based on comments:
The loadFileAndMemcpy() function in the worker thread loads via Win32 ReadFile function - does this cause the main thread to block?
I reduced the worker thread priority to either THREAD_PRIORITY_BELOW_NORMAL and THREAD_PRIORITY_LOWEST, and that helps a bit, but when I move the mouse around to see how slowly it moves while the worker thread is working, the mouse "jumps" a bit (without lowering the priority, it was MUCH worse).
I am running on a Core 2 Duo, so I wouldn't expect to see any mouse lag at all.
Mutex code doesn't seem to be an issue since the "blocked!" never printed in my test code above.
I bumped the sleep up to 100ms, but even 1000ms doesn't seem to help as far as the mouse lag goes.
Data being loaded is tiny - 20k .png images (but they are 2048x2048).. they are small size since this is just test data, one single color in the image, so real data will be much larger.

You will have to show the code for the main thread to indicate how it is notified that it a file is loaded. Most likely the blocking issue is there. This is really a good case for using asynchronous I/O instead of threads if you can work it into your main loop. If nothing else you really need to use conditions or events. One to trigger the file reader thread that there is work to do, and another to signal the main thread a file has been loaded.
Edit: Alright, so this is a game, and you're polling to see if the file is done loading as part of the rendering loop. Here's what I would try: use ReadFileEx to initiate an overlapped read. This won't block. Then in your main loop you can check if the read is done by using one of the Wait functions with a zero timeout. This won't block either.

Not sure on your specific problem but you really should mutex-protect the size call as well.
void fileLoadThreadFunc(void *arglist) {
while (true) {
s_mutex.lock();
while (s_filesToLoad.size() == 0) {
s_mutex.unlock();
Sleep(10);
s_mutex.lock();
}
loadObj *obj = s_filesToLoad[0];
s_filesToLoad.erase(s_filesToLoad.begin());
s_mutex.unlock();
obj->loadFileAndMemcpy();
}
}
Now, examining your specific problem, I can see nothing wrong with the code you've provided. The main thread and file loader thread should quite happily run side-by-side if that mutex is the only contention between them.
I say that because there may be other points of contention, such as in the standard library, that your sample code doesn't show.

I'd write that loop this way, less locking unlock which could get messed up :P :
void fileLoadThreadFunc(void *arglist)
{
while(true)
{
loadObj *obj = NULL;
// protect all access to the vector
s_mutex.lock();
if(s_filesToLoad.size() != 0)
{
obj = s_filesToLoad[0];
s_filesToLoad.erase(s_filesToLoad.begin());
}
s_mutex.unlock();
if( obj != NULL )
obj->loadFileAndMemcpy();
else
Sleep(10);
}
}
MSDN on Synchronization

if you can consider open source options, Java has a blocking queue [link] as does Python [link]. This would reduce your code to (queue here is bound to load_fcn, i.e. using a closure)
def load_fcn():
while True:
queue.get().loadFileAndMemcpy()
threading.Thread(target=load_fcn).start()
Even though you're maybe not supposed to use them, python 3.0 threads have a _stop() function and python2.0 threads have a _Thread__stop function. You could also write a "None" value to the queue and check in load_fcn().
Also, search stackoverflow for "[python] gui" and "[subjective] [gui] [java]" if you wish.

Based on the information present at this point, my guess would be that something in handler for the file loading is interacting with your main loop. I do not know the libraries involved, but based on your description the file handler does something along the following lines:
Load raw binary data for a 20k file
Interpret the 20k as a PNG file
Load into a structure representing a 2048x2048 pixel image
The following possibilities come to mind regarding the libraries you use to achieve these steps:
Could it be that the memory allocation for the uncompressed image data is holding a lock that the main thread needs for any drawing / interactive operations it performs?
Could it be that a call that is responsible for translating the PNG data into pixels actually holds a low-level game library lock that adversely interacts with your main thread?
The best way to get some more information would be to try and model the activity of your file loader handler without using the current code in it... write a routine yourself that allocates the right size of memory block and performs some processing that turns 20k of source data into a structure the size of the target block... then add further logic to it one bit at a time until you narrow down when performance crashes to isolate the culprit.

I think that your problem lies with access to the FilesToLoad object.
As I see it this object is locked by your thread when the it is actually processing it (which is every 10ms according to your code) and by your main thread as it is trying to update the list. This probably means that your main thread is waiting for a while to access it and/or the OS sorts out any race situations that may occur.
I would would suggest that you either start up a worker thread just to load a file when you as you want it, setting a semaphore (or even a bool value) to show when it has completed or use _beginthreadex and create a suspended thread for each file and then synchronise them so that as each one completes the next in line is resumed.
If you want a thread to run permenently in the background erasing and loading files then you could always have it process it's own message queue and use windows messaging to pass data back and forth. This saves a lot of heartache regarding thread locking and race condition avoidance.

Related

Qt: How to make one thread wait for a temporary roadblock, and temporarily increase another thread's priority to remove the roadbock?

I have two threads:
GUI, which does the typical GUI stuff and manages a bunch of flags that affect the Processing thread
Processing, which handles realtime data on a 30Hz period forever
There are lots of examples of how to have one thread wait for another to finish, but none for how to make a temporary roadbock without killing the thread.
There's a function in my GUI thread that contains this:
Scene* scene = getSceneToFadeFrom();
scene->setSelected(false);
///TODO: wait until (!scene->processing)
fadeFrom = scene->dmx;
and one in my Processing thread that contains this while looping through a QList:
if(scene->getSelected())
{
scene->processing = true;
scene->run(); //updates scene->dmx
scene->processing = false;
}
If this were an embedded project on bare metal, I could use the global interrupt enable flag in place of scene->processing (invert the logic) and be done, which dedicates the entire CPU to that task at the expense of all others.
But because this is a desktop project with an operating system to play nice with, how can I achieve the same effect within this project? Basically, pause the GUI thread at that point until scene->processing == false (which it might be already) and guarantee that the Processing thread is actually running while the GUI thread waits for it.

And here's what I came up with. It was actually an XY problem. I'm surprised that I didn't think of this right away because I had already done something similar for deleting a Scene:
GUI thread:
//(sceneToReplace != 0) means there's something for Processing to do
sceneToReplace = getSceneToFadeFrom();
if(sceneToReplace)
{
sceneToReplace->setSelected(false);
}
Processing thread, same class:
if(sceneToReplace)
{
fadeFrom = sceneToReplace->dmx;
sceneToReplace = 0;
}
and I don't even need the processing flag anymore!
fadeFrom gets set a little later than in the original veresion, but it's not actually needed until then anyway.

Ending a function in C++

I have a C++ function called write() that is supposed to write a large number of bytes on the users HDD. This woud normally take more than 30 seconds, so I want to give the user the ability to Abort operation from a GUI (created with Qt).
So I have the button generate_button. When this button is clicked(), I want to have something to end the function, wherever its progress is.
I thought about threads, but I am not sure. Can you advice me please?

I would probably use a thread. It should be quite simple to check a variable to see if the operation has been canceled.
Use a mutex to lock access to your cancel variable. That will make sure it is read and written in a proper way for multiple threads. Another option is if you are using C++11 use an atomic variable.
Break your large write into blocks of smaller size. 8 to 64 kilobytes should work. After writing each block check your cancel variable and if set, exit the thread.

Place the code that actually does the writing in a worker thread. Have a shared variable (one that is either atomic, or protected by a mutex). Have the worker thread check its value each iteration. If the user presses the "Abort" button, set the value for the variable.
You should use threads if this is a long running operation.
Since you are using C++11, std::atomic<bool> would probably serve you well.

Threaded guarantees that you will have a responsive GUI. But there is a learning curve to using a thread in this manner.
A threadless way to do this is to have in your routine that writes to the harddrive in the GUI thread, but gives time to the GUI thread to stay responsive.
QObject::connect(my_cancel_button, SIGNAL(clicked()), file_writer, SLOT(setCanceled()));
// open file for writing
QFile file("filename.txt");
file.open(//... );//
while(still_have_data_to_write && !canceled)
{
write( <1 MB of data> ); // or some other denomination of data
qApp->processEvents();// allows the gui to respond to events such as clicks on buttons
// update a progress bar... using a counter as a ratio of the total file size
emit updateProgressBar(count++);
}
if( canceled )
{
file.close();
// delete the partial file using QDir
}
Hope that helps.

How to handle signal flood from another thread in Qt

tl;dr: I have a QThread which sends a signal to the main thread whenever new data is available for processing. Main thread then acquires, processes and displays the data. The data arrives more often the the main thread is able to process it resulting in frozen GUI and eventually a stack overflow (yay!).
Details
My application acquires frames from a camera for processing and displaying. The camera notifies when the new frame is available through a windows event. I have a thread which periodically checks for these events and notifies the main thread when new frame is available for grabs:
void Worker::run()
{
running_ = true;
while (running_)
{
if (WaitForSingleObject(nextColorFrameEvent, 0) == WAIT_OBJECT_0)
emit signalColorFrame();
usleep(15);
}
}
signalColorFrame is connected to a slot in Camera class which gets the frame from camera, does some processing and sends it to MainWindow which draws it to the screen.
void Camera::onNewColorFrame()
{
getFrameFromCamera();
processFrame();
drawFrame();
}
Now if that method completes before the next frame is available, everything works fine. As the processing gets more complex though the Camera class receives new signals, before it's done with processing a previous frame.
My solution is to block signals from the worker thread for the time of processing and force the even loop to run in between with QCoreApplication::processEvents():
void Camera::onNewColorFrame()
{
worker_->blockSignals(true)
getFrameFromCamera();
processFrame();
drawFrame();
QCoreApplication::processEvents(); // this is essential for the GUI to remain responsive
worker_->blockSignals(false);
}
Does that look like a good way of doing it? Can someone suggest a better solution?

I think before you solve technical side you should consider to think about design side of your application. There are several ways your problem can be solved, but first you should decide what to do with frames which you dont have time to process in main thread. Are you going to skip them or save for later processing, but then you should realise that processing queue still must have certain size limits, so you anyway should decide what to do with 'out of bound' data.
I would personally prefer in such cases make some intermediate container which holds data which received somewhere, so your camera processing thread just notify collector that data received and collector decides if its going to store or skip data. And main loop as soon as it has time access collector in a form fetchNext() or fetchAll() depending on what you need and implements object processing.

C++11 threads, run on main thread

I am doing some development trying out C++11 threads. I would like to run some code in an asynchronous thread and when that code is done I would like to run other code on the main thread But only when it's done!
This is because the things I would like to run async is loading OpenGL stuff, and it's a bit tricky with the OpenGL contexts when doing threads, as far as I know it's pretty much a don't run the same context in different threads.
However I would like to create a loader thread, which loads collada files and the time consuming stuff here really is to parse the file and set up the data and all of that I could (technically) do in a separate thread and then just do the opengl specific tasks on the main thread. (this is my initial thought and I might just be going at it the wrong way)..
So I am thinking that if I could detach one thread async to load up the collada file and fill the out the data, then once it's finished I'll invoke on the main thread to bind buffers, set up shaders and so on. I can do it without threads, but would be pretty smooth to load new data in the background without GL freaking out..
So I'll try to line up the steps I want to do:
Main thread goes around doing what it does...
Someone asks for a new mesh to be loaded
the mesh initialized by creating an async thread and loading inside it the collada data
meanwhile the main thread goes on doing it's stuff
once the collada loading is done the async thread informs the main thread that it wishes to do additional loading (i.e. setup buffers, and so on) ON main thread.
the setting up finishes and the mesh adds itself to a render queue
I do have all of it working synchronous and what I want is a way to do some things once the detached async thread finishes.
Any ideas or of course constructive criticism of my thinking here :P is greeted with a warm welcome! I might be thinking about it all the wrong way, I've been thinking about doing something like an observer pattern but I don't really know how to tackle it the best way. I wouldn't mind threading the OpenGL stuff, but it seem a bit like asking for trouble..

If I understood your use case correctly, then I think the std::async() function, started with the std::launch::async policy to make sure the operation is really started on another thread, does just what you want:
// Your function to be executed asynchronously
bool load_meshes();
// You can provide additional arguments if load_meshes accepts arguments
auto f = std::async(std::launch::async, load_meshes);
// Here, the main thread can just do what it has to do...
// ...and when it's finished, it synchronizes with the operation
// and retrieve its result (if any)
bool res = f.get(); // res will hold the return value of load_meshes,
// or this will throw an exception if one was
// thrown inside load_meshes()
if (res)
{
// ... and then it will go on doing the remaining stuff on the main thread
}
One tricky thing to be aware of here, is that you should always assign the return value of std::async() to some variable (of type std::future<T>), where T is the type returned by load_meshes(). Failing to do so will cause the main thread to wait until load_meshes() is finished (thus, it's as if the function was invoked in the main thread).

How does the message loop use threads?

I'm somewhat confused and wondering if I've been misinformed, in a separate post I was told "New threads are only created when you make them explicitly. C++ programs are by default single threaded." When I open my program that doesn't explicitly create new threads in ollydbg I noticed multiple times that there are often 2 threads running. I wanted to understand how the message loop works without stopping up execution, the explanation I got was very insufficient at explaining how it works.
Does the message loop create a new thread or does it take up the main thread? If it takes the main thread does it do so after everything else has been executed regardless of code order? If it doesn't do this but still takes up the main thread does it spawn a new thread so that the program can execute instead of getting stuck in the message loop?
EDIT: Solved most of my questions with experimentation. The message loop occupies the main thread and any code after the code:
while (GetMessage (&messages, NULL, 0, 0))
{
TranslateMessage(&messages);
DispatchMessage(&messages);
}
return messages.wParam;
Will not execute unless something special is done to cause it to execute because the program is stuck in the message loop. Putting an infinite loop in a window procedure that gets executed causes the program to crash. I still don't understand the mystery of the multiple threads when in olly to the degree I would prefer though.

Perhaps the place to start is to realize that "the message loop" isn't a thing as such; it's really just something that a thread does.
Threads in windows generally fall into one of two categories: those that own UI, and those that do background work (eg network operations).
A simple UI app typically has just one thread, which is a UI thread. For the UI to work, the thread needs to wait until there's some input to handle (mouse click, keyboard input, etc), handle the input (eg. update the state and redraw the window), and then go back to waiting for more input. This whole act of "wait for input, process it, repeat" is the message loop. (Also worth mentioning at this stage is the message queue: each thread has its own input queue which stores up the input messages for a thread; and the act of a thread "waiting for input" is really about checking if there's anything in the queue, and if not, waiting till there is.) In win32 speak, if a thread is actively processing input this way, it's also said to be "pumping messages".
A typical simple windows app's mainline code will first do basic initialization, create the main window, and then do the wait-for-input-and-process-it message loop. It does this usually until the user closes the main window, at which point the thread exits the loop, and carries on executing the code that comes afterwards, which is usually cleanup code.
A common architecture in windows apps is to have a main UI thread - usually this is the main thread - and it creates and owns all the UI, and has a message loop that dispatches messages for all of the UI that the thread created. If an app needs to do something that could potentially block, such as reading from a socket, a worker thread is often used for that purpose: you don't want the UI thread to block (eg. while waiting for input from a socket), as it wouldn't be processing input during that time and the UI would end up being unresponsive.
You could write an app that had more than one UI thread in it - and each thread that creates windows would then need its own message loop - but it's a fairly advanced technique and not all that useful for most basic apps.
The other threads you are seeing are likely some sort of helper threads that are created by Windows to do background tasks; and for the most part, you can ignore them. If you initialize COM, for example, windows may end up creating a worker thread to manage some COM internal stuff, and it may also create some invisible HWNDs too.

Typically the thread that starts the program only runs the message loop, taking up the main thread. Anything not part of handling messages or updating the UI is typically done by other threads. The additional thread that you see even if your application doesn't create any threads could be created by a library or the operating system. Windows will create threads inside your process to handle things like dispatching events to your message loop.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js