Memory Corruption on c++ multithreaded application - c++

I am developing a multithreaded C++ application, and developed a module for logging. The logging module is a static class, which I call using Logger::Log(string file, string message), that fills a static queue with a pair<string*,string*> The queue itself is a queue<<pair<string*,string*>*>. Everything is saved as a pointer, as I was trying to avoid garbage collection, and believe pointer variables need specific delete to free the memory.
Now, when one of the threads wants to log something, it calls the Log method, which in turn appends to the end of the queue.
Another thread runs through the queue, pops items and writes them to the designated file.
For some reason, some of the text being written to the file is corrupted, as I am losing part of the begginning or the end of the message.
For example, if I call Log("file", "this is my message"), inside the Log method I am prepending a timestamp, and creating a new string, because I thought the original string might be overwritten, but it still happens.
The problem is that in some situations, what is being written to the file is the timestamp, plus only the end of the message.
This is the full code of the Logger class:
#include "Logger.h"
queue<pair<string*, string*>*> Logger::messages;
boost::mutex Logger::loggerLock;
void Logger::CleanOldFiles(vector<string> files){
for (vector<string>::iterator it = files.begin(); it != files.end(); ++it) {
string filePath = boost::filesystem::current_path().string() + "\\" + *it;
int result = remove(filePath.c_str());
}
}
void Logger::Init() {
Logger::messages = queue<pair<string*, string*>*>();
boost::thread workerThread(Logger::Process);
//workerThread.start_thread();
}
void Logger::RawLog(string file, string message) {
loggerLock.lock();
string *f = new string(file);
string *m = new string(message + "\n");
messages.push(new pair<string*, string*>(f, m));
loggerLock.unlock();
}
void Logger::Log(string file, string message) {
loggerLock.lock();
string *f = new string(file);
string *m = new string(Functions::CurrentTime() + " (" + boost::lexical_cast<string>(boost::this_thread::get_id()) + "): " + message.c_str() + "\n");
messages.push(new pair<string*, string*>(f, m));
loggerLock.unlock();
}
void Logger::Process() {
while (true) {
if (Logger::messages.size() == 0) {
boost::this_thread::sleep(boost::posix_time::milliseconds(200));
continue;
}
loggerLock.lock();
pair<string*, string*> *entry = messages.front();
messages.pop();
loggerLock.unlock();
ofstream file(boost::filesystem::current_path().string() + "\\" + *(entry->first), ofstream::binary | ofstream::app);
file.write(entry->second->c_str(), entry->second->length());
file.close();
delete entry->first;
delete entry->second;
delete entry;
//cout << entry->second;
}
}
I hope I made myself clear enough...
I do not understand why this is happening, can anyone give me some hints on how to avoid this?
Thanks in advance.

The Logger::Log must be made MT-safe, otherwise you can get two or more threads trying to log something simultaneously. A simplest way to make it MT-safe is a mutex.

std::queue is not thread-safe. You need to lock access to all the shared objects or to use thread-safe queue implementation like TBB provides.

if (Logger::messages.size() == 0) {
Because messages isn't thread safe, you can't call any functions on it when you don't hold the lock. Also, you are still missing delete calls for the string*s.
You can always just do this:
void Logger::Process()
{
while (true)
{
loggerLock.lock();
if (Logger::messages.size() == 0)
break;
loggerLock.unlock();
boost::this_thread::sleep(boost::posix_time::milliseconds(200));
}
pair<string*, string*> *entry = messages.front();
messages.pop();
loggerLock.unlock();
ofstream file(boost::filesystem::current_path().string() + "\\" +
*(entry->first), ofstream::binary | ofstream::app);
file.write(entry->second->c_str(), entry->second->length());
delete entry->first;
delete entry->second;
file.close();
}
However, for future reference:
1) Don't use pointers this way. Just use a queue of pairs of strings. The pointers don't buy you anything.
2) Use some sane synchronization mechanism like a condition variable. Don't use sleep for synchronization.
3) Use scoped locks and RAII so you don't forget an unlock.

I apologize for the work I might have caused for all trying to help me, but I corrected the issue. This was actually no memory corruption issue, and my assumption made me search for something where there was no problem.
The issue was on the calling threads, and on the form I was creating the log strings. I was concatenating strings in the wrong way...damn those other programming languages ;)
I was doing something such as Logger::Log("Message: " + integerVariable), which is fact was shifting the string to the right (at least I believe that was what it was doing). When I casted all those variables to strings, everything started working. Thank you for all your help anyway.

Related

C++ - Pointer to local variable within the function

I know this can look like a rookie question already asked a thousand time. But I searched for the exact answer and I haven't found one...
I'm working on a code that, to sum up, fill an XML with different data.
I'm trying to optimize a part of it. The "naïve" code is the following:
xml << "<Node>";
for(auto& input : object.m_vec)
{
if(input == "Something")
{
xml << input;
}
}
xml << "</Node>";
for(auto& input : object.m_vec)
{
if(input == "SomethingElse")
{
xml << "<OtherNode>";
xml << input;
xml << "</OtherNode>";
break;
}
}
The important thing is, while more than one input fit in <Node></Node>, only one fit in <OtherNode></OtherNode> (explaining the break;) and it may not exist either (explaining the xml << in-between the if statement).
I think I could optimize it such like:
std::vector<std::string>* VecPointer;
xml << "<Node>";
for(auto& input : object.m_vec)
{
if(input == "Something")
{
xml << input;
}
else if(input == "SomethingElse")
{
VecPointer = &input;
}
}
xml << "</Node>";
if(!VecPointer->empty())
{
xml << "<OtherNode>"
<< *VecPointer
<< "</OtherNode>";
}
The point for me here is that there is no extra memory needed and no extra loop. But the pointer to the local variable bothers me. With my beginner's eyes I can't see a case where it can lead to something wrong.
Is this okay? Why? Do you see a better way to do it?
You need to make sure your compairson also looks for an existing value within the VecPointer, since your original second loop only cares about the first value it comes across.
else if(VecPointer && input == "SomethingElse")
Don't look for ->empty(), as that's accessing the pointer and asking whether the pointed to vector is empty. If there's nothing to point to in the first place, you're going to have a bad time at the -> stage of the statement. Instead, if against it, since it's a pointer.
if(VecPointer)
Finally, you're using a Vector to save that one value from m_vec, which from other code I'm assuming is not a vector<vector<string>> but a vector<string> - in the latter case, your VecPointer should be std::string*
std::string* VecPointer = nullptr;
I'm trying to optimize a part of it.
...
Is this okay?
Maybe not! This may already be a poor use of your time. Are you sure that this is what's hurting your performance? Or that there's a performance problem at all?
Remember Don Knuth's old adage: Premature optimization is the root of all evil...
Do you see a better way to do it?
Consider profiling your program to see which parts actually take up the most time.
On an unrelated note, you could use standard library algorithms to simplify your (unoptimized) code. For example:
if (std::ranges::find(std::begin(object.m_vec) std::end(object.m_vec), "SomethingElse"s )
!= std::end(object.m_vec))
{
xml << "<OtherNode>" << whatever << "</OtherNode>";
}

Is the returned string from a recursive function going out of scope?

Giving as simple of a background context as I possibly can, which I don't think is necessary for what I'm trying to figure out at the moment, I'm trying to implement a graph representation via adjacency list, in my case being an unordered map that has a string key to a struct value that contains Vertex object pointers (the object that is identified by the key), and a vector of its dependencies. The goal is to output a critical path via a sort of DAG resolution algorithm.
So when I need to output a critical path, I'm trying to use a recursive solution I implemented. Basically it looks for a base case (if a job has no dependencies), return a print out of its id, start time and length. Otherwise, find the longest running (in terms of time length) job in its dependency list and call the function on that until you find a job with no dependencies. There can be more than one critical path, and I don't have to print out all of them.
MY QUESTION: I'm debugging this at the moment, and it has no problem printing out a job's properties when its a base case. If it has to recurse through though, the string always comes back as empty (""). Is the recursive call making my string go out of scope by the time it comes back to the caller? Here is the code structure for it. All of the functions below are public members of the same Graph class.
string recurseDeps(unordered_map<string, Dependencies>& umcopy, string key) {
if (umcopy[key].deps.empty()) {
string depPath = " ";
string idarg, starg, larg, deparg;
idarg = key;
starg = " " + to_string(umcopy[key].jobatKey->getStart());
larg = " " + to_string(umcopy[key].jobatKey->getStart() + umcopy[key].jobatKey->getLength());
umcopy.erase(key);
return depPath + idarg + starg + larg;
}
else {
string lengthiestDep = umcopy[key].deps[0];
for (auto i = begin(umcopy[key].deps); i != end(umcopy[key].deps); i++) {
if (umcopy[*i].jobatKey->getLength() >
umcopy[lengthiestDep].jobatKey->getLength()) {
lengthiestDep = *i;
}
}
recurseDeps(umcopy, lengthiestDep);
}
}
string criticalPath(unordered_map<string, Dependencies>& um, vector<Vertex*> aj) {
unordered_map<string, Dependencies> alCopy = um;
string path = aj[0]->getId();
for (auto i = begin(aj); i != end(aj); i++) {
if (um[(*i)->getId()].jobatKey->getLength() >
um[path].jobatKey->getLength()) {
path = (*i)->getId();
}
}
return recurseDeps(alCopy, path);
}
Later on down in the class members, a function called readStream() calls the functions like so:
cout << time << criticalPath(adjList, activeJobs) << endl;
You're not returning the value when you recurse. You're making the recursive call, but discarding the value and just falling off the end of the function. You need to do:
return recurseDeps(umcopy, lengthiestDep);
First of all, to answer your question, since you return by value the string is copied so no need to worry about variables going out of scope.
Secondly, and a much bigger problem, is that not all paths of your recursive function actually returns a value, which will lead to undefined behavior. If your compiler doesn't already warn you about this, you should enable more warnings.

QDir::remove() always causing a crash when called in specific SLOT

Everytime I call QDir::removeRecursively() my application crashes AFTER having removed the folder containing the files correctly.
Having done some testing I found out that it depends on how I call the function. This is my code:
Recorder::Recorder(ParentClass *parent): QObject(parent){
connect(this,SIGNAL(finishedRec(QString,QString)),this,SLOT(finishedRecording(QString,QString)));
}
void Recorder::start(){
if (!recording){
recording=true;
recThread.reset(new std::thread(&Recorder::recordThread, this));
}
}
void Recorder::stop(){
recording = false;
recThread->join(); recThread.reset();
}
void Recorder::recordThread(){
QString picDir;
QString filename;
//....
while(recording){
//writing frames to folder
}
emit finishedRec(picDir,filename);
}
void Recorder::finishedRecording(QString picDir, QString filename){
QProcess* proc = new QProcess();
vecProcess.push_back(proc);
vecString.push_back(picDir);
proc->start("C:\\ffmpeg.exe", QStringList() <<"-i"<< picDir << "-r"<< "30" << "-vcodec"<< "ffv1" << filename);
connect(proc,SIGNAL(finished(int)),this,SLOT(finishedProcess()));
}
void Recorder::finishedProcess(){
for (int i=0; i<vecProcess.size();i++){
if(vecProcess.at(i)->state()==QProcess::NotRunning){
delete vecProcess.at(i);
vecProcess.erase(vecProcess.begin() + i);
QString folderToRemove=vecString.at(i);
folderToRemove.chop(12);
qDebug() << folderToRemove;
QDir dir(folderToRemove);
dir.removeRecursively();
vecString.erase(vecString.begin() + i);
}
}
}
Only if I leave dir.removeRecursively() in, my application will always crash. Without it everything works as intended. Even deleting all the files with
QDir dir(path);
dir.setNameFilters(QStringList() << "*.*");
dir.setFilter(QDir::Files);
foreach(QString dirFile, dir.entryList()){
dir.remove(dirFile);
}
will cause a crash AFTRER all the files were deleted.
I'm running my recordThead as a std::unique_ptr<std::thread>. I did try to run the thread as a QThread but that gave me the exact same result. If dir.removeRecursively() was called the program will crash after finishing the event in finishedProcess()
Calling removeRecursively() in a different event loop works. Why doesn't it work when using it in a SLOT like shown in my example?
vector erase effectively reduces the container size by the number of elements removed, which are destroyed.
// one suspect
vecProcess.erase(vecProcess.begin() + i);
// another suspect
vecString.erase(vecString.begin() + i);
And you call that in a loop where 'i' gets incremented? Should eventually attempt to erase something beyond the vector size. I would just release entire container if possible after the loop finished or used list. And maybe you don't need to store pointers in those containers but values (?). Storing pointers to objects allocated by you makes you to release the one by one and sometimes, yes, justified but with C++ 11 and move semantics it is not always the case.

Seg Fault resulting from push_back call on vector (threads linux)

So what I'm trying to do is write a program that creates a series of child threads that take the arguments using the pthread_create method and uses the parameter passed in to do more manipulation and so on. The parameter I'm trying to pass in is a vector argument called reduce_args_. this is the header information for the struct ReduceVector.
typedef vector<string> StringVector;
// a data structure to maintain info for the reduce task
struct ReduceArg
{
ReduceArg (void); // constructor
~ReduceArg (void); // destructor
pthread_t tid; // thread id of the reduce thread
StringVector files_to_reduce; // set of files for reduce task
};
// more typedefs
typedef vector<ReduceArg *> ReduceVector;
now the issues comes when I call push_back here:
for(int i = 0; i < num_reduce_threads_ ; i++){
reduce_args_.push_back(phold);
int count = 0;
for(ShuffleSet::iterator it = shuffle_set_.begin(); it!=shuffle_set_.end(); ++it){
string line = *it;
string space = " ";
string file = line.substr(0, line.find(space)) + ".txt";
if (count < num_reduce_threads_){
cout << reduce_args_[i+1];
(reduce_args_[i+1] -> files_to_reduce)[count] = file;
//(reduce_args_[i+1] -> files_to_reduce).push_back(file);
}
count++;
//cout << ((reduce_args_.back())->files_to_reduce).back()<< endl;
}
}
both of those push_back methods cause a seg fault. the shuffle set is just a set and is outputting strings. and as noted in the .h file, the files_to_reduce is a string vector. So what I'm trying to do is access the files_to_reduce and push_back a string onto it, but each time I get a seg fault. The reduce_args_ obj is declared as below:
ReduceArg* plhold;
reduce_args_.push_back(plhold);
((reduce_args_.back()) -> files_to_reduce).push_back("hello");
for (int i = 0; i < this->num_reduce_threads_; ++i) {
// create a placeholder reduce argument and store it in our vector
(reduce_args_.push_back(plhold));
}
thanks for the help!!
This:
ReduceArg* plhold;
reduce_args_.push_back(plhold);
Unless you've hidden some important code, you're pushing an uninitialised pointer, so the next line will cause chaos.
Possibly you meant this?
ReduceArg* plhold(new ReduceArg);
..but I suspect you haven't properly thought about the object lifetimes and ownership of the object whose address you are storing in the vector.
In general, avoid pointers unless you know exactly what you're doing, and why. The code as posted doesn't need them, and I would recommend you just use something like this:
typedef vector<ReduceArg> ReduceVector;
....
reduce_args_.push_back(ReduceArg());
reduce_args_.back().files_to_reduce.push_back("hello");
for (int i = 0; i < num_reduce_threads_; ++i) {
// create a placeholder reduce argument and store it in our vector
(reduce_args_.push_back(ReduceArg());
}

Debug assertion failed: Subscript out of range with std::vector

I'm trying to fix this problem which seems like I am accessing at an out of range index, but VS fails to stop where the error occurred leaving me confused about what's causing this.
The Error:
Debug Assertion Failed! Program: .... File: c:\program files\microsoft visual studio 10.0\vc\include\vector Line: 1440 Expression: String subscript out of range
What the program does:
There are two threads:
Thread 1:
The first thread looks (amongst other things) for changes in the current window using GetForegroundWindow(), the check happens not on a loop but when a WH_MOUSE_LL event is triggered. The data is split into structs of fixed size so that it can be sent to a server over tcp. The first thread and records the data (Window Title) into an std::list in the current struct.
if(change_in_window)
{
GetWindowTextW(hActWin,wTitle,256);
std::wstring title(wTitle);
current_struct->titles.push_back(title);
}
Thread 2:
The second thread is called looks for structs not send yet, and it puts their content into char buffers so that they can be sent over tcp. While I do not know exactly where the error is, looking from the type of error it was to do either with a string or a list, and this is the only code from my whole application using lists/strings (rest are conventional arrays). Also commenting the if block as mentioned in the code comments stops the error from happening.
BOOL SendStruct(DATABLOCK data_block,bool sycn)
{
[..]
int _size = 0;
// Important note, when this if block is commented the error ceases to exist, so it has something to do with the following block
if(!data_block.titles.empty()) //check if std::list is empty
{
for (std::list<std::wstring>::iterator itr = data_block.titles.begin(); itr != data_block.titles.end() ; itr++) {
_size += (((*itr).size()+1) * 2);
} //calculate size required. Note the +1 is for an extra character between every title
wchar_t* wnd_wbuffer = new wchar_t[_size/2](); //allocate space
int _last = 0;
//loop through every string and every char of a string and write them down
for (std::list<std::wstring>::iterator itr = data_block.titles.begin(); itr != data_block.titles.end(); itr++)
{
for(unsigned int i = 0; i <= (itr->size()-1); i++)
{
wnd_wbuffer[i+_last] = (*itr)[i] ;
}
wnd_wbuffer[_last+itr->size()] = 0x00A6; // separator
_last += itr->size()+1;
}
unsigned char* wnd_buffer = new unsigned char[_size];
wnd_buffer = (unsigned char*)wnd_wbuffer;
h_io->header_w_size = _size;
h_io->header_io_wnd = 1;
Connect(mode,*header,conn,buffer_in_bytes,wnd_buffer,_size);
delete wnd_wbuffer;
}
else
[..]
return true;
}
My attempt at thread synchronization:
There is a pointer to the first data_block created (db_main)
pointer to the current data_block (db_cur)
//datablock format
typedef struct _DATABLOCK
{
[..]
int logs[512];
std::list<std::wstring> titles;
bool bPrsd; // has this datablock been sent true/false
bool bFull; // is logs[512] full true/false
[..]
struct _DATABLOCK *next;
} DATABLOCK;
//This is what thread 1 does when it needs to register a mouse press and it is called like this:
if(change_in_window)
{
GetWindowTextW(hActWin,wTitle,256);
std::wstring title(wTitle);
current_struct->titles.push_back(title);
}
RegisterMousePress(args);
[..]
//pseudo-code to simplify things , although original function does the exact same thing.
RegisterMousePress()
{
if(it_is_full)
{
db_cur->bFull= true;
if(does db_main exist)
{
db_main = new DATABLOCK;
db_main = db_cur;
db_main->next = NULL;
}
else
{
db_cur->next = new DATABLOCK;
db_cur = db_cur->next;
db_cur->next = NULL;
}
SetEvent(eProcessed); //tell thread 2 there is at least one datablock ready
}
else
{
write_to_it();
}
}
//this is actual code and entry point of thread 2 and my attempy at synchronization
DWORD WINAPI InitQueueThread(void* Param)
{
DWORD rc;
DATABLOCK* k;
SockWClient writer;
k = db_main;
while(true)
{
rc=WaitForSingleObject(eProcessed,INFINITE);
if (rc== WAIT_OBJECT_0)
{
do
{
if(k->bPrsd)
{
continue;
}
else
{
if(!k)
{break;}
k->bPrsd = TRUE;
#ifdef DEBUG_NET
SendStruct(...);
#endif
}
if(k->next == NULL || k->next->bPrsd ==TRUE || !(k->next->bFull))
{
ResetEvent(eProcessed);
break;
}
} while (k = k->next); // next element after each loop
}
}
return 1;
}
Details:
Now something makes me believe that the error is not in there, because the substring error is very rare. I have been only able to reproduce it with 100% chance when pressing Mouse_Down+Wnd+Tab to scroll through windows and keeping it pressed for some time (while it certainly happened on other cases as well). I avoid posting the whole code because it's a bit large and confusion is unavoidable. If the error is not here I will edit the post and add more code.
Thanks in advance
There does not appear to be any thread synchronization here. If one thread reads from the structure while the other writes, it might be read during initialization, with a non-empty list containing an empty string (or something invalid, in between).
If there isn't a mutex or semaphore outside the posted function, that is likely the problem.
All the size calculations appear to be valid for Windows, although I didn't attempt to run it… and <= … -1 instead of < in i <= (itr->size()-1) and 2 instead of sizeof (wchar_t) in new wchar_t[_size/2](); are a bit odd.
The problem with your code is that while thread 2 correctly waits for the data and thread 1 correctly notifies about them, thread 2 doesn't prevent thread 1 from doing anything with them under its hands while it still process the data. The typical device used to solve such problem is the monitor pattern.
It consist of one mutex (used to protect the data, held anytime you access them) and a condition variable (=Event in Windows terms), which will convey the information about new data to the consumer.
The producer would normally obtain the mutex, produce the data, release the mutex, then fire the event.
The consumer is more tricky - it has to obtain the mutex, check if new data hasn't become available, then wait for the Event using the SignalObjectAndWait function that temporarily releases the mutex, then process newly acquired data, then release the mutex.