Multithreading and heap corruption - c++

So I just started trying out some multithreaded programming for the first time, and I've run into this heap corruption problem. Basically the program will run for some random length of time (as short as 2 seconds, as long as 200) before crashing and spitting out a heap corruption error. Everything I've read on the subject says its very hard thing to diagnose, since the what triggers the error often has little to do with what actually causes it. As such, I remain stumped.
I haven't been formally taught multithreading however, so I was mostly programming off of what I understood of the concept, and my code may be completely wrong. So here's a basic rundown of what I'm trying to do and how the program currently tries to handle it:
I'm writing code for a simple game that involves drawing several parallaxing layers of background. These levels are very large (eg 20000x5000 pixels), so obviously trying to load 3 layers of those sized images is not feasible (if not impossible). So currently the images are split up into 500x500 images and I have the code only have the images it immediately needs to display held in memory. Any images it has loaded that it no longer needs are removed from memory. However, in a single thread, this causes the program to hang significantly while waiting for the image to load before continuing.
This is where multithreading seemed logical to me. I wanted the program to do the loading it needed to do, without affecting the smoothness of the game, as long as the image was loaded by the time it was actually needed. So here is how I have it organized:
1.) All the data for where the images should go and any data associated with them is all stored in one multidimensional array, but initially no image data is loaded. Each frame, the code checks each position on the array, and tests if the spot where the image should go is within some radius of the player.
2.) If it is, it flags this spot as needing to be loaded. A pointer to where the image should be loaded into is push_back()'d onto a vector.
3.) The second thread is started once the level begins. This thread is initially passed a pointer to the aforementioned vector.
4.) This thread is put into an infinite While loop (which by itself sounds wrong) that only terminates when the thread is terminated. This loop continuously checks if there are any elements in the vector. If there are, it grabs the 0th element, loads the image data into that pointer, then .erase()'s the element from the vector.
That's pretty much a rundown of how it works. My uneducated assumption is that the 2 threads collide at some point trying to write and delete in the same space at once or something. Given that I'm new to this I'm certain this method is terrible to some embarrassing degree, so I'm eager to hear what I should improve upon.
EDIT: Adding source code upon request:
class ImageLoadQueue
{
private:
ImageHandle* image;
std::string path;
int frameWidth, frameHeight, numOfFrames;
public:
ImageLoadQueue();
ImageLoadQueue(ImageHandle* a, std::string b, int c, int d, int e=1) { setData(a,b,c,d,e); }
void setData(ImageHandle* a, std::string b, int c, int d, int e=1)
{
image = a;
path = b;
frameWidth = c;
frameHeight = d;
numOfFrames = e;
}
void loadThisImage() { image->loadImage(path, frameWidth, frameHeight, numOfFrames, numOfFrames); }
};
class ImageLoadThread : public sf::Thread
{
private:
std::vector<ImageLoadQueue*>* images;
public:
ImageLoadThread() { };
ImageLoadThread(std::vector<ImageLoadQueue*>* a) { linkVector(a); }
void linkVector(std::vector<ImageLoadQueue*>* a) { images = a; }
virtual void Run()
{
while (1==1)
{
if (!images->empty())
{
(*images)[0]->loadThisImage();
images->erase(images->begin());
}
}
}
};
class LevelArt
{
private:
int levelWidth, levelHeight, startX, startY, numOfLayers;
float widthScale, heightScale, widthOfSegs, heightOfSegs;
float* parallaxFactor;
ImageHandle** levelImages;
int** frame;
int** numOfFrames;
bool* tileLayer;
bool** isLoaded;
Animation** animData;
std::string** imagePath;
std::vector<ImageLoadQueue*> imageQueue;
ImageLoadThread imageThread;
public:
LevelArt(void);
LevelArt(std::string);
~LevelArt(void);
void loadData(std::string);
void drawLevel(sf::RenderWindow*, float, float);
void scaleLevel(float, float);
void forceDraw(sf::RenderWindow*);
void wipeLevel();
void initialLoad();
int getLevelWidth() { return levelWidth; }
int getLevelHeight() { return levelHeight; }
int getTotalWidth() { return widthOfSegs*levelWidth; }
int getTotalHeight() { return heightOfSegs*levelHeight; }
int getStartX() { return startX; }
int getStartY() { return startY; }
};
That's most of the relevant threading code, in this header. Within the levelArt.cpp file exists 3 nested for loops to iterate through all the levelArt data stored, testing if they exist close enough to the player to be displayed, wherein it calls:
imageQueue.push_back(new ImageLoadQueue(&levelImages[i][(j*levelWidth)+k], imagePath[i][(j*levelWidth)+k], widthOfSegs, heightOfSegs, numOfFrames[i][(j*levelWidth)+k]));
i,j,k being the for loop iterators.

This seems like a reasonable use of multithreading. The key idea (in other words, the main place you'll have problems if you do it wrong) is that you have to be careful about data that is used by more than one thread.
You have two places where you have such data:
The vector (which, by the way, should probably be a queue)
The array where you return the data
One way to arrange things - by no means the only one - would be to wrap each of these into its own class (e.g., a class that has a member variable of the vector). Don't allow any direct access to the vector, only through methods on the class. Then synchronize the methods, for example using a mutex or whatever the appropriate synchronization object is. Note that you're synchronizing access to the object, not just the individual methods. So it's not enough to put a mutex in the "read from queue" method; you need a common mutex in the "read from queue" and "write to queue" methods so that no one is doing one while the other occurs. (Also note I'm using the term mutex; that may be a very wrong thing to use depending on your platform and the exact situation. I would likely use a semaphore and a critical section on Windows.)
Synchronization will make the program thread-safe. That's different than making the program efficient. To do that, you probably want a semaphore that represents the number of items in the queue, and have your "load data thread" wait on that semaphore, rather than doing a while loop.

Related

Trying to control multithreaded access to array using std::atomic

I'm trying to control multithreaded access to a vector of data which is fixed in size, so threads will wait until their current position in it has been filled before trying to use it, or will fill it themselves if no-one else has yet. (But ensure no-one is waiting around if their position is already filled, or no-one has done it yet)
However, I am struggling to understand a good way to do this, especially involving std::atomic. I'm just not very familiar with C++ multithreading concepts aside from basic std::thread usage.
Here is a very rough example of the problem:
class myClass
{
struct Data
{
int res1;
};
std::vector<Data*> myData;
int foo(unsigned long position)
{
if (!myData[position])
{
bar(myData[position]);
}
// Do something with the data
return 5 * myData[position]->res1;
}
void bar(Data* &data)
{
data = new Data;
// Do a whole bunch of calculations and so-on here
data->res1 = 42;
}
};
Now imagine if foo() is being called multi-threaded, and multiple threads may (or may not) have the same position at once. If that happens, there's a chance that a thread may (between when the Data was created and when bar() is finished, try to actually use the data.
So, what are the options?
1: Make a std::mutex for every position in myData. What if there are 10,000 elements in myData? That's 10,000 std::mutexes, not great.
2: Put a lock_guard around it like this:
std::mutex myMutex;
{
const std::lock_guard<std::mutex> lock(myMutex);
if (!myData[position])
{
bar(myData[position]);
}
}
While this works, it also means if different threads are working in different positions, they wait needlessly, wasting all of the threading advantage.
3: Use a vector of chars and a spinlock as a poor man's mutex? Here's what that might look like:
static std::vector<char> positionInProgress;
static std::vector<char> positionComplete;
class myClass
{
struct Data
{
int res1;
};
std::vector<Data*> myData;
int foo(unsigned long position)
{
if (positionInProgress[position])
{
while (positionInProgress[position])
{
; // do nothing, just wait until it is done
}
}
else
{
if (!positionComplete[position])
{
// Fill the data and prevent anyone from using it until it is complete
positionInProgress[position] = true;
bar(myData[position]);
positionInProgress[position] = false;
positionComplete[position] = true;
}
}
// Do something with the data
return 5 * myData[position]->res1;
}
void bar(Data* data)
{
data = new Data;
// Do a whole bunch of calculations and so-on here
data->res1 = 42;
}
};
This seems to work, but none of the test or set operations are atomic, so I have a feeling I'm just getting lucky.
4: What about std::atomic and std::atomic_flag? Well, there are a few problems.
std::atomic_flag doesn't have a way to test without setting in C++11...which makes this kind of difficult.
std::atomic is not movable or copy-constructable, so I cannot make a vector of them (I do not know the number of positions during construction of myClass)
Conclusion:
This is the simplest example that (likely) compiles I can think of that demonstrates my real problem. In reality, myData is a 2-dimensional vector implemented using a special hand-rolled solution, Data itself is a vector of pointers to more complex data types, the data isn't simply returned, etc. This is the best I could come up with.
The biggest problem you're likely to have is that a vector itself is not thread-safe, so you can't do ANY operation that might chage the vector (invalidate references to elements of the vector) while another thread might be accessing it, such as resize or push_back. However, if you vector is effectively "fixed" (you set the size prior to ever spawning threads and thereafter only ever access elements using at or operator[] and never ever modify the vector itself), you can get away with using a vector of atomic objects. In this case you could have:
std::vector<std::atomic<Data*>> myData;
and your code to setup and use an element could look like:
if (!myData[position]) {
Data *tmp = new Data;
if (!mydata[position].compare_exchange_strong(nullptr, tmp)) {
// some other thread did the setup
delete tmp; } }
myData[position]->bar();
Of course you still need to make sure that the operations done on members of Data in bar are themselves threadsafe, as you can get mulitple threads calling bar on the same Data instance here.

Not executing all writes before end of program

For a school project we were tasked with writing a ray tracer. I chose to use C++ since it's the language I'm most comfortable with, but I've been getting some weird artifacts.
Please keep in mind that we are still in the first few lessons of the class, so right now we are limited to checking whether or not a ray hits a certain object.
When my raytracer finishes quickly (less than 1 second spent on actual ray tracing) I've noticed that not all hits get registered in my "framebuffer".
To illustrate, here are two examples:
1.:
2.:
In the first image, you can clearly see that there are horizontal artifacts.
The second image contains a vertical artifact.
I was wondering if anyone could help me to figure out why this is happening?
I should mention that my application is multi-threaded, the multithreaded portion of the code looks like this:
Stats RayTracer::runParallel(const std::vector<Math::ivec2>& pixelList, const Math::vec3& eyePos, const Math::vec3& screenCenter, long numThreads) noexcept
{
//...
for (int i = 0; i < threads.size(); i++)
{
threads[i] = std::thread(&RayTracer::run, this, splitPixels[i], eyePos, screenCenter);
}
for (std::thread& thread: threads)
{
thread.join();
}
//...
}
The RayTracer::run method access the framebuffer as follows:
Stats RayTracer::run(const std::vector<Math::ivec2>& pixelList, const Math::vec3& eyePos, const Math::vec3& screenCenter) noexcept
{
this->frameBuffer.clear(RayTracer::CLEAR_COLOUR);
// ...
for (const Math::ivec2& pixel : pixelList)
{
// ...
for (const std::shared_ptr<Objects::Object>& object : this->objects)
{
std::optional<Objects::Hit> hit = object->hit(ray, pixelPos);
if (hit)
{
// ...
if (dist < minDist)
{
std::lock_guard lock (this->frameBufferMutex);
// ...
this->frameBuffer(pixel.y, pixel.x) = hit->getColor();
}
}
}
}
// ...
}
This is the operator() for the framebuffer class
class FrameBuffer
{
private:
PixelBuffer buffer;
public:
// ...
Color& FrameBuffer::operator()(int row, int col) noexcept
{
return this->buffer(row, col);
}
// ...
}
Which make use of the PixelBuffer's operator()
class PixelBuffer
{
private:
int mRows;
int mCols;
Color* mBuffer;
public:
// ...
Color& PixelBuffer::operator()(int row, int col) noexcept
{
return this->mBuffer[this->flattenIndex(row, col)];
}
// ...
}
I didn't bother to use any synchronization primitives because each thread gets assigned a certain subset of pixels from the complete image. The thread casts a ray for each of its assigned pixels and writes the resultant color back to the color buffer in that pixel's slot. This means that, while all my threads are concurrently accessing (and writing to) the same object, they don't write to the same memory locations.
After some initial testing, using a std::lock_guard to protect the shared framebuffer seems to help, but it's not a perfect solution, artifacts still occur (although much less common).
It should be noted that the way I divide pixels between threads determines the direction of the artifacts. If I give each thread a set of rows the artifacts will be horizontal lines, if I give each thread a set of columns, the artifacts will be vertical lines.
Another interesting conclusion is that when I trace more complex objects (These take anywhere between 30 seconds and 2 minutes) these aretifacts are extremely rare (I've seen it once in my 100's-1000's of traces so far)
I can't help but feel like this is a problem related to multithreading, but I don't really understand why std::lock_guard wouldn't completely solve the problem.
Edit: After suggestions by Jeremy Friesner I ran the raytracer about 10 times on a single thread, without any issues, so the problem does indeed appear to be a race condition.
I solved the problem thanks to Jeremy Friesner.
As you can see in the code, every thread calls framebuffer.clear() separately (without locking the mutex!). This means that thread A might have already hit 5-10 pixels because it was started first when thread B clears the framebuffer. this would erase thread A's already hit pixels.
By moving the framebuffer.clear() call to the beginning of the runParallel() method I was able to solve the issue.

Minimize lock contention c++ std::map

I have an std::map<int, Object*> ObjectMap. Now I need to update the map and update can happen via multiple threads. So, we lock the map for updates. But every update leads to a lengthy computation and hence leads to lock contention.
Let's consider a following scenario.
class Order //Subject
{ double _a, _b,_c;
std::vector<Customer* > _customers;
public:
void notify(int a, int b. int c)
{
//update all customers via for loop. assume a for loop and iterator i
_customers[i] ->updateCustomer(a,b,c)
}
};
class SomeNetworkClass
{
private:
std::map<int, Order*> _orders;
public:
void updateOrder(int orderId, int a, int b, intc)
{
//lock the map
Order* order = _orders[orderId];
order->notify();
//release the lock
}
}
class Customer
{
public:
void updateCustomer(int a,int b, int c)
{
//some lengthy function. just for this example.
//assume printing a, b and c multiple times
}
}
Every Customer is also updated with some computation involved.
Now this is a trivial Observer Pattern. But with large number of observers and huge calculation in each observer is a killer for this design. The lock contention goes up in my code. I assume this to b a practical problem but people use smarter ways and I am looking for those smarter ways. I hope I am a little clear this time
Thanks
Shiv
Since update is happening on the element of the map, and does not take the map as an argument, I assume the map is unchanging.
I visualise the structure as a chain of Objects for each map id. Now if chain contains distinct entries (and update doesn't access any elements outside its chain, or any global elements) you can get away with adding a lock to the root element of each chain.
However if objects down the chain are potentially shared then you have a more difficult problem. In that case adding a lock to each object should be enough. You can show that if the chains behave correctly (each node has one child, but children can be shared) then locks must be acquired in a consistent order, meaning there is no chance of a deadlock.
If there is other sharing between chains, then the chance of encountering deadlocks is large.
Assuming you have case 2, then your code will look roughly like this
class Object
{
Object * next;
Lock l;
Data d;
void update(Data d_new)
{
l.lock();
d = d_new;
next->update(d_new);
l.unlock();
}
};

How and what data must be synced in multithreaded c++

I build a little application which has a render thread and some worker threads for tasks which can be made nearby the rendering, e.g. uploading files onto some server. Now in those worker threads I use different objects to store feedback information and share these with the render thread to read them for output purpose. So render = output, worker = input. Those shared objects are int, float, bool, STL string and STL list.
I had this running a few months and all was fine except 2 random crashes during output, but I learned about thread syncing now. I read int, bool, etc do not require syncing and I think it makes sense, but when I look at string and list I fear potential crashes if 2 threads attempt to read/write an object the same time. Basically I expect one thread changes the size of the string while the other might use the outdated size to loop through its characters and then read from unallocated memory. Today evening I want to build a little test scenario with 2 threads writing/reading the same object in a loop, however I was hoping to get some ideas here aswell.
I was reading about the CriticalSection in Win32 and thought it may be worth a try. Yet I am unsure what the best way would be to implement it. If I put it at the start and at the end of a read/function it feels like some time was wasted. And if I wrap EnterCriticalSection and LeaveCriticalSection in Set and Get Functions for each object I want to have synced across the threads, it is alot of adminstration.
I think I must crawl through more references.
Okay I am still not sure how to proceed. I was studying the links provided by StackedCrooked but do still have no image of how to do this.
I put copied/modified together this now and have no idea how to continue or what to do: someone has ideas?
class CSync
{
public:
CSync()
: m_isEnter(false)
{ InitializeCriticalSection(&m_CriticalSection); }
~CSync()
{ DeleteCriticalSection(&m_CriticalSection); }
bool TryEnter()
{
m_isEnter = TryEnterCriticalSection(&m_CriticalSection)==0 ? false:true;
return m_isEnter;
}
void Enter()
{
if(!m_isEnter)
{
EnterCriticalSection(&m_CriticalSection);
m_isEnter=true;
}
}
void Leave()
{
if(m_isEnter)
{
LeaveCriticalSection(&m_CriticalSection);
m_isEnter=false;
}
}
private:
CRITICAL_SECTION m_CriticalSection;
bool m_isEnter;
};
/* not needed
class CLockGuard
{
public:
CLockGuard(CSync& refSync) : m_refSync(refSync) { Lock(); }
~CLockGuard() { Unlock(); }
private:
CSync& m_refSync;
CLockGuard(const CLockGuard &refcSource);
CLockGuard& operator=(const CLockGuard& refcSource);
void Lock() { m_refSync.Enter(); }
void Unlock() { m_refSync.Leave(); }
};*/
template<class T> class Wrap
{
public:
Wrap(T* pp, const CSync& sync)
: p(pp)
, m_refSync(refSync)
{}
Call_proxy<T> operator->() { m_refSync.Enter(); return Call_proxy<T>(p); }
private:
T* p;
CSync& m_refSync;
};
template<class T> class Call_proxy
{
public:
Call_proxy(T* pp, const CSync& sync)
: p(pp)
, m_refSync(refSync)
{}
~Call_proxy() { m_refSync.Leave(); }
T* operator->() { return p; }
private:
T* p;
CSync& m_refSync;
};
int main
{
CSync sync;
Wrap<string> safeVar(new string);
// safeVar what now?
return 0;
};
Okay so I was preparing a little test now to see if my attempts do something good, so first I created a setup to make the application crash, I believed...
But that does not crash!? Does that mean now I need no syncing? What does the program need to effectively crash? And if it does not crash why do I even bother. It seems I am missing some point again. Any ideas?
string gl_str, str_test;
void thread1()
{
while(true)
{
gl_str = "12345";
str_test = gl_str;
}
};
void thread2()
{
while(true)
{
gl_str = "123456789";
str_test = gl_str;
}
};
CreateThread( NULL, 0, (LPTHREAD_START_ROUTINE)thread1, NULL, 0, NULL );
CreateThread( NULL, 0, (LPTHREAD_START_ROUTINE)thread2, NULL, 0, NULL );
Just added more stuff and now it crashes when calling clear(). Good.
void thread1()
{
while(true)
{
gl_str = "12345";
str_test = gl_str;
gl_str.clear();
gl_int = 124;
}
};
void thread2()
{
while(true)
{
gl_str = "123456789";
str_test = gl_str;
gl_str.clear();
if(gl_str.empty())
gl_str = "aaaaaaaaaaaaa";
gl_int = 244;
if(gl_int==124)
gl_str.clear();
}
};
The rules is simple: if the object can be modified in any thread, all accesses to it require synchronization. The type of object doesn't matter: even bool or int require external synchronization of some sort (possibly by means of a special, system dependent function, rather than with a lock). There are no exceptions, at least in C++. (If you're willing to use inline assembler, and understand the implications of fences and memory barriers, you may be able to avoid a lock.)
I read int, bool, etc do not require syncing
This is not true:
A thread may store a copy of the variable in a CPU register and keep using the old value even in the original variable has been modified by another thread.
Simple operations like i++ are not atomic.
The compiler may reorder reads and writes to the variable. This may cause synchronization issues in multithreaded scenarios.
See Lockless Programming Considerations for more details.
You should use mutexes to protect against race conditions. See this article for a quick introduction to the boost threading library.
First, you do need protection even for accessing the most primitive of data types.
If you have an int x somewhere, you can write
x += 42;
... but that will mean, at the lowest level: read the old value of x, calculate a new value, write the new value to the variable x. If two threads do that at about the same time, strange things will happen. You need a lock/critical section.
I'd recommend using the C++11 and related interfaces, or, if that is not available, the corresponding things from the boost::thread library. If that is not an option either, critical sections on Win32 and pthread_mutex_* for Unix.
NO, Don't Start Writing Multithreaded Programs Yet!
Let's talk about invariants first.
In a (hypothetical) well-defined program, every class has an invariant.
The invariant is some logical statement that is always true about an instance's state, i.e. about the values of all its member variables. If the invariant ever becomes false, the object is broken, corrupted, your program may crash, bad things have already happened. All your functions assume that the invariant is true when they are called, and they make sure that it is still true afterwards.
When a member function changes a member variable, the invariant might temporarily become false, but that is OK because the member function will make sure that everything "fits together" again before it exits.
You need a lock that protects the invariant - whenever you do something that might affect the invariant, take the lock and do not release it until you've made sure that the invariant is restored.

Passing new data to an asynchronous, threaded function that may still be using the old data

I am having some problem related to C/C++:
Suppose I have some class
class Demo
{
int constant;
public:
void setConstant(int value)
{
constant=value;
}
void submitTask()
{
// need to make a call to C-based runtime system to submit a
// task which will be executed "asynchronously"
submitTask((void *)&constant);
}
};
// runtime system will call this method when task will be executed
void func(void *arg)
{
int constant= *((int *)arg);
// Read this constant value but don't modify here....
}
Now in my application, I do something like this:
int main()
{
...
Demo objDemo;
for(...)
{
objDemo.setConstant(<somevalue>);
objDemo.submitTask();
}
...
}
Now, hopefully you see the problem as tasks should read the value set immediately before a asynchronous call . As task calls are asynchronous so a task can read wrong value and sometimes results in unexpected behavior.
I don't want to enforce synchronous task execution just because of this constraint. The number of tasks created are not known in advance. I just need to pass this simple integer constant in an elegant way that will work with asynchronous. Obviously I cannot change the runtime behavior (mean that signature of this method void func(void *arg) is fixed).
Thanks in advance.
If you don't want to wait for the C code to finish before you make the next call then you can't reuse the same memory location over and over. Instead, create an array and then pass those locations. For this code, I'm going to assume that the number of times the for loop will run is n. This doesn't have to be known until it's time for the for loop to run.
int* values = new int[n];
for(int i=0;i<n;i++) {
values[i] = <somevalue>;
submitTask((void*)&values[i]);
}
At some later point when you're sure it's all done, then call
delete[] values;
Or, alternately, instead of an array of ints, create an array of Demo objects.
Demo demo[] = new Demo[n];
for(int i=0;i<n;i++) {
demo[i].setConstant(<somevalue>);
demo[i].submitTask();
}
But the first makes more sense to me as the Demo object doesn't really seem to do anything worthwhile. But you may have left out methods and members not relevant to the question, so that could change which option is best. Regardless, the point is that you need separate memory locations for separate values if you don't know when they're going to get used and don't want to wait.