Difficult to track SIGSEGV Segmentation fault in large program

Difficult to track SIGSEGV Segmentation fault in large program - c++

I apologise for posting a question that has been asked many times (I've just read 10 pages of them) but I can't find a solution.
I'm working on a multi-threaded graphic/audio program using OpenGL and Portaudio respectively. The audio thread uses a library I'm making for audio processing objects. The SIGSEGV happens maybe 20% of the time (much less when debugging) and happens when resetting loads of audio objects with new stream information (sample rate, vector size etc). Code::blocks Debugger states the fault as originating from different places each time the fault happens.
This is the audio processing loop:
while(true){
stream->tick();
menuAudio.tick();
{
boost::mutex::scoped_lock lock(*mutex);
if(channel->AuSwitch.resetAudio){
uStreamInfo newStream(channel->AuSwitch.newSrate,
channel->AuSwitch.newVSize, channel->AuSwitch.newChans);
menuAudio.resetStream(&newStream);
(*stream) = newStream;
menuAudio.resetStream(stream);
channel->AuSwitch.resetAudio = false;
}
}
}
It checks information from the graphics thread telling it to reset the audio and runs the resetStream function of the patch object, which is basically a vector for audio objects and runs each of them:
void uPatch::resetStream(uStreamInfo* newStream)
{
for(unsigned i = 0; i < numObjects; ++i){
/*This is where it reports this error: Program received signal SIGSEGV,
Segmentation fault. Variables: i = 38, numObjects = 43 */
objects[i]->resetStream(newStream);
}
}
Sometimes it states the SIGSEGV as originating from different locations, but due to the rarity of it faulting when run with the debugger this is the only one I could get to happen.
As there are so many objects, I won't post all of their reset code, but as an example:
void uSamplerBuffer::resetStream(uStreamInfo* newStream)
{
audio.set(newStream, false);
control.set(newStream, true);
stream = newStream;
incr = (double)buffer->sampleRate / (double)stream->sampleRate;
index = 0;
}
Where the audio.set code is:
void uVector::set(uStreamInfo* newStream, bool controlVector)
{
if(vector != NULL){
for(unsigned i = 0; i < stream->channels; ++i)
delete[] vector[i];
delete vector;
}
if(controlVector)
channels = 1;
else
channels = newStream->channels;
vector = new float*[channels];
for(unsigned i = 0; i < channels; ++i)
vector[i] = new float[newStream->vectorSize];
stream = newStream;
this->flush();
}
My best guess would be that it's a stack overflow issue, as it only really happens with a large number of objects, and they each run fine individually. That said, the audio stream itself runs fine and is run in a similar way. Also the loop of objects[i]->resetStream(newStream); should pop the stack after each member function, so I can't see why it would SIGSEGV.
Any observations/recommendations?
EDIT:
It was an incorrectly deleted memory issue. Application Verifier made it fault at the point of the error instead of the occasional faults identified as stemming from other locations. The problem was in the uVector stream setting function, as the intention of the class is for audio vectors using multidimensional arrays using stream->channels, with the option of using single dimensional arrays for control signals. When deleting to reallocate the memory I accidentally set all uVectors regardless of type to delete using stream-> channels.
if(vector != NULL){
for(unsigned i = 0; i < stream->channels; ++i)
delete[] vector[i];
delete vector;
}
Where it should have been:
if(vector != NULL){
for(unsigned i = 0; i < this->channels; ++i)
delete[] vector[i];
delete vector;
}
So it was deleting memory it shouldn't have access to, which corrupted the heap. I'm amazed the segfault didn't happen more regularly though, as that seems like a serious issue.

I you can spare the memory, you can try a tool like Electric Fence (or DUMA, its child) to see if it's an out of bound write that you perform.
Usually these types of segfaults (non-permanent, only occurring sometimes) are relics of a previous buffer overflow somewhere.
You could try Valgrind also, which will have the same effect as the 2 tools above, to the cost of a slower execution.
Also, try to check what's the value of the bad address you're accessing when this happens: is it looking valid? Sometimes a value can be very informative on the bug you're encountering (typically: trying to access memory at 0x12 where 0X12 is the counter in a loop :)).
For stack overflows... I'd suggest trying to increase the stack size of the incriminated thread, see if the bug is reproduced. If not after a good bunch of tries, you've found the problem.
As for windows:
How to debug heap corruption errors?
Heap corruption under Win32; how to locate?
https://stackoverflow.com/search?q=windows+memory+corruption&submit=search

I think you just made it a Stack Overflow issue. :)
In all seriousness, bugs like these are usually the result of accessing objects at memory locations where they no longer exist. In your first code block, I see you creating newStream on the stack, with a scope limited to the if statement it is a part of. You then copy it to a dereferenced pointer (*stream). Is safe and correct assignment defined for the uStreamInfo class? If not explicitly defined, the compiler will quietly provide memberwise copy for object assignment, which is OK for simple primitives like int and double, but not necessarily for dynamically allocated objects. *stream might be left with a pointer to memory allocated by newStream, but has since been deallocated when newStream went out of scope. Now the data at that RAM is still there, and for a moment will look correct, but being deallocated memory, it could get corrupted at any time, like just before a crash. :)
I recommend paying close attention to when objects are allocated and deallocated, and which objects own which other ones. You can also take a divide an conquer approach, commenting out most of the code and gradually enabling more until you see crashes starting to occur again. The bug is likely in the most recently re-enabled code.

Related

C++ stack and heap corruption

I was recently reading about stack & heap corruption in C & C++. The author of the website demonstrates stack corruption using below example.
#include<stdio.h>
int main(void)
{
int b = 10;
int a[3];
a[0] = 1;
a[1] = 2;
a[2] = 3;
printf(" b = %d \n",b);
a[3] = 12; // oops it is invalid, behaviour is undefined
printf(" b = %d \n",b);
printf("address of b= %x\n",&b);
printf("address of a[3]= %x\n",&a[3]);
return 0;
}
I tested above program on visual studio 2010 compiler (VC++) & it gives me runtime error that says:
stack around variable a gets corrupted
Now my question: is stack corrupted for lifetime or it is only for the time during when above erroneous program was being executed?
Same way, I know that deleting same pointer twice might do really bad things like heap corruption.
The following code:
int* p=new int();
delete p;
delete p; // oops disaster here, undefined behaviour
When the above code fragment executes the VC++ shows heap corruption error at runtime.

It is Undefined Behaviour. You cannot know what will happen if you do 'forbidden' things. You have no guarantee that your program will work well.

You have to be careful with terminology here. Will the stack be "corrupted" for the remainder of your program's life? It may be; it may not be. In this instance you've only corrupted data within the current stack frame, so once you're out of that function call, in practice your "corruption" will have gone.
But that's not quite the whole story. Since you've overwritten a variable with bytes that aren't supposed to be there, what knock-on effects might that have on your program? The consequences of this memory corruption could feasibly be logically passed on to other function scopes, or even other computers if you're sending this data over a network connection and the data is no longer in the expected form. (Typically, your data protocol will have safety features built into it to detect and discard unexpected forms of data; but, that's up to you.)
The same is true of heap corruption. Any time you overwrite the bytes of something that is not supposed to be overwritten, and any time you do so with arbitrary or unknowable data, you run the risk of potentially catastrophic consequences that may logically last well beyond the lifetime of your program.
Within the scope of C++ as a language, this condition is summed up in a specific phrase: undefined behaviour. It states that you can't really rely on anything at all after you've corrupted your memory. Once you've invoked UB, all bets are off.
The one guarantee that you usually have in practice is that your OS will not allow you to directly overwrite any memory that does not belong to your program. That is, corrupting the memory of other processes or of the OS itself is very difficult. The memory model of modern OSs is deliberately designed that way in order to keep programs isolated and prevent this kind of damage from broken programs and/or viruses.

C++ as well as C does not have array boundary overflow or underflow check. However, you can abstract out, you may define an array with overloaded index operator (operator []) where you can check for array index out of bounds and act accordingly. When you delete a pointer using delete ptr (when ptr is allocated through new), the space allocated before is returned back to heap space, however the value of the pointer becomes same as before. So, it;s a good programming practice that you should make the ptr NULL after delete, e.g.
int* p=new int();
...
if (p) {
delete p;
p = (int *) NULL;
}
// double deletion is prevented, and ptr us not dangling any more
if (p) {
delete p;
p = (int *) NULL;
}
However, stack or heap corruption, if at all, lies confined within the program space and when the program terminates, normally or abnormally, all memory occupies are released back to the operating system

C++ bad allocation from heap fragmentation?

I have the following problem.I run a loop for 3000 times. On every pass I allocate a byte buffer on the heap:
uint8_t* frameDest;
try {
frameDest= new uint8_t[numBytes * sizeof(uint8_t)];
memcpy(frameDest, frameSource, _numBytes);
} catch(std::exception &e) {
printf(e.what());
}
So the allocated frame serves as destination for a data from some frameSource._numBytes equals 921600 .
Next, in the same loop frameDest is pushed into std::vector of uint8_t* pointers(_frames_cache).This vector serves as frames cache and being cleaned every X frame.With the current setup I clean the vector when more than 20 frames are in the cache.The method that cleans the cache is this:
void FreeCache()
{
_frameCacheMutex.lock();
try {
int cacheSize = _frames_cache.size();
for (int i = 0; i < cacheSize; ++i) {
uint8_t* frm = _frames_cache.front();
_frames_cache.erase(_frames_cache.begin());
delete [] frm;
}
} catch (std::exception& e) {
printf(e.what());
}
_frameCacheMutex.unlock();
}
The issue:bad alloc exception is thrown after ~2000+ frames in the first code block.I tested for memory leaks with Dr.Memory and found none.Also I am getting no erros or exceptions on allocations/deallocation on other parts of the program.I have 2 instances of such a code running in 2 separate thread which means during the whole lifetime of this program some 6000 allocations / deallocations are processed, 960000 bytes each.In the whole app there are more heap allocs going on but not at the frequency as in this part.I have read that modern compilers handle heap management in a pretty advanced way,and still,I suspect my issue has to do with memory fragmentation.I use Visual C++ 2012 compiler (C++ 11) ,32bit under Windows7 64bit OS.
My question is:how likely it is memory management problem and should I write or use a custom heap alloc manager?If not,what could it be?

It's hard to tell if this is a memory fragmentation problem. But you may want to consider allocating a fixed amount of memory for your frames as a ring buffer at the beginning, so that during runtime you won't run into any fragmentation (and also save the time for memory (de)allocation). Not sure if this works with your use case, of course.

C++ StackOverflowException initializing struct over 63992

"Process is terminated due to StackOverflowException" is the error I receive when I run the code below. If I change 63993 to 63992 or smaller there are no errors. I would like to initialize the structure to 100,000 or larger.
#include <Windows.h>
#include <vector>
using namespace std;
struct Point
{
double x;
double y;
};
int main()
{
Point dxF4struct[63993]; // if < 63992, runs fine, over, stack overflow
Point dxF4point;
vector<Point> dxF4storage;
for (int i = 0; i < 1000; i++) {
dxF4point.x = i; // arbitrary values
dxF4point.y = i;
dxF4storage.push_back(dxF4point);
}
for (int i = 0; i < dxF4storage.size(); i++) {
dxF4struct[i].x = dxF4storage.at(i).x;
dxF4struct[i].y = dxF4storage.at(i).y;
}
Sleep(2000);
return 0;
}

You are simply running out of stackspace - it's not infinite, so you have to take care not to run out.
Three obvious choices:
Use std::vector<Point>
Use a global variable.
Use dynamic allocation - e.g. Point *dxF4struct = new Point[64000]. Don't forget to call delete [] dxF4struct; at the end.
I listed the above in order that I think is preferable.
[Technically, before someone else points that out, yes, you can increase the stack, but that's really just moving the problem up a level somewhere else, and if you keep going at it and putting large structures on the stack, you will run out of stack eventually no matter how large you make the stack]

Increase the stack size. On Linux, you can use ulimit to query and set the stack size. On Windows, the stack size is part of the executable and can be set during compilation.
If you do not want to change the stack size, allocate the array on the heap using the new operator.

Well, you're getting a stack overflow, so the allocated stack is too small for this much data. You could probably tell your compiler to allocate more space for your executable, though just allocating it on the heap (std::vector, you're already using it) is what I would recommend.

Point dxF4struct[63993]; // if < 63992, runs fine, over, stack overflow
That line, you're allocating all your Point structs on the stack. I'm not sure the exact memory size of the stack but the default is around 1Mb. Since your struct is 16Bytes, and you're allocating 63393, you have 16bytes * 63393 > 1Mb, which causes a stackoverflow (funny posting aboot a stackoverflow on stack overflow...).
So you can either tell your environment to allocate more stack space, or allocate the object on the heap.
If you allocate your Point array on the heap, you should be able to allocate 100,000 easily (assuming this isn't running on some embedded proc with less than 1Mb of memory)
Point *dxF4struct = new Point[63993];
As a commenter wrote, it's important to know that if you "new" memory on the heap, it's your responsibility to "delete" the memory. Since this uses array new[], you need to use the corresponding array delete[] operator. Modern C++ has a smart pointer which will help with managing the lifetime of the array.

Can the cause of SIGSEGV be the low ram of the system?

My system ram is small, 1.5GB. I have a C++ programm that calls a specific method about 300 times. This method uses 2 maps (they are cleared every time) and I would like to know if it is possible in some of the calls of this method that the stack is overflowed and the program fails. If I put small data (so the method is called 30 times) the program runs fine. But now it raises SIGSEGV error. I am trying to fix this for about 3 days and no luck, every solution I tried failed.
I found some cause of the SIGSEGV below but nothing helped
What is SIGSEGV run time error in C++?
Ok, here is the code.
I have 2 instances, which contain some keywords-features and their scores
I want to get their eucleidian distance, which means I have to save all the keywords for each of the instances, then find the diffs for the keywords of the first one with those of the second and then find the diffs for the remaining of the second instance. What I want is while iterating the first map, to be able to delete elements from the second. The following method is called multiple times as we have two message collections, and every message from the first one is compared with every message from the second.
I have this code but it suddenly stops although I checked it is working for some seconds with multiple cout I put in some places
Note that this is for a university task so I cannot use boost and all those tricks. But I would like to know the way to bypass the problem I am into.
float KNNClassifier::distance(const Instance& inst1, const Instance& inst2) {
map<string,unsigned> feat1;
map<string,unsigned> feat2;
for (unsigned i=0; i<inst1.getNumberOfFeatures(); i++) {
feat1[inst1.getFeature(i)]=i;
}
for (unsigned i=0; i<inst2.getNumberOfFeatures(); i++) {
feat2[inst2.getFeature(i)]=i;
}
float dist=0;
map<string,unsigned>::iterator it;
for (it=feat1.begin(); it!=feat1.end(); it++) {
if (feat2.find(it->first)!=feat2.end()) {//if and only if it exists in inst2
dist+=pow( (double) inst1.getScore(it->second) - inst2.getScore(feat2[it->first]) , 2.0);
feat2.erase(it->first);
}
else {
dist+=pow( (double) inst1.getScore(it->second) , 2.0);
}
}
for (it=feat2.begin(); it!=feat2.end(); it++) {//for the remaining words
dist+=pow( (double) inst2.getScore(it->second) , 2.0);
}
feat1.clear(); feat2.clear(); //ka8arizoume ta map gia thn epomenh xrhsh
return sqrt(dist);
}
and I also tried this idea in order to not have to delete something but it suddenly stops too.
float KNNClassifier::distance(const Instance& inst1, const Instance& inst2) {
map<string,unsigned> feat1;
map<string,unsigned> feat2;
map<string,bool> exists;
for (unsigned i=0; i<inst1.getNumberOfFeatures(); i++) {
feat1[inst1.getFeature(i)]=i;
}
for (unsigned i=0; i<inst2.getNumberOfFeatures(); i++) {
feat2[inst2.getFeature(i)]=i;
exists[inst2.getFeature(i)]=false;
if (feat1.find(inst2.getFeature(i))!=feat1.end()) {
exists[inst2.getFeature(i)]=true;
}
}
float dist=0;
map<string,unsigned>::iterator it;
for (it=feat1.begin(); it!=feat1.end(); it++) {
if (feat2.find(it->first)!=feat2.end()) {
dist+=pow( (double) inst1.getScore(it->second) - inst2.getScore(feat2[it->first]) , 2.0);
}
else {
dist+=pow( (double) inst1.getScore(it->second) , 2.0);
}
}
for (it=feat2.begin(); it!=feat2.end(); it++) {
if(it->second==false){//if it is true, it means the diff was done in the previous iteration
dist+=pow( (double) inst2.getScore(it->second) , 2.0);
}
}
feat1.clear(); feat2.clear(); exists.clear();
return sqrt(dist);
}

If malloc fails and thus returns NULL it can indeed lead to a SIGSEGV assuming the program does not properly handle that failure. However, if memory was that low your system would more likely start killing processes using lots of memory (the actual logic is more complicated, google for "oom killer" if you are interested).
Chances are good that there's simply a bug in your program. A good way to figure this out is using a memory debugger such as valgrind to see if you access invalid memory locations.

One possible explanation is that your program accesses a dynamically-allocated object after freeing it. If the object is small enough, the memory allocator keeps the memory around for the next allocation, and the access after free is harmless. If the object is large, the memory allocator unmaps the pages used to hold the object, and the access after free causes a SIGSEGV.
It is virtually certain that regardless of the underlying mechanism by which the SIGSEGV occurs, there is a bug in the code somewhere that is a key part of the causal chain.

As mentioned above, the most probable cause is bad memory allocation or memory leak. Check for buffer overflows, or if you try to access a resource after you free it.

1.5GB isn't that small. You can do a lot in 1.5GB in general. For 300 iterations to use up 1.5GB (let's say 0.5GB is used by the OS kernel, etc), you need to use roughly 32MB per iteration. That is quite a lot of memory, so, my guess is that either your code is actually using A LOT of memory, or your code contains a leak of some sort. More likely the latter. I have worked on machines with less than 64KB, and my first PC had 8MB of ram, and that was considered A LOT at the time.

No, this code is unable to cause a segfault if the system runs out of memory. map allocation uses the new operator, which does not use the stack for allocation. It uses the heap, and will throw a bad_alloc exception if the memory is exhausted, aborting before an invalid memory access can happen:
$ cat crazyalloc.cc
int main(void)
{
while(1) {
new int[100000000];
}
return 0;
}
$ ./crazyalloc
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
The fact that an alternative implementation also crashes is a hint that the problem is not in this code.
The problem is on the Instance class instead. It's probably not lack of memory, it should be a buffer overflow, which can be confirmed with a debugger.

Boost::thread mutex issue: Try to lock, access violation

I am currently learning how to multithread with c++, and for that im using boost::thread.
I'm using it for a simple gameengine, running three threads.
Two of the threads are reading and writing to the same variables, which are stored inside something i call PrimitiveObjects, basicly balls, plates, boxes etc.
But i cant really get it to work, i think the problem is that the two threads are trying to access the same memorylocation at the same time, i have tried to avoid this using mutex locks, but for now im having no luck, this works some times, but if i spam it, i end up with this exception:
First-chance exception at 0x00cbfef9 in TTTTT.exe: 0xC0000005: Access violation reading location 0xdddddded.
Unhandled exception at 0x77d315de in TTTTT.exe: 0xC0000005: Access violation reading location 0xdddddded.
These are the functions inside the object that im using for this, and the debugger is also blaming them for the exception.
int PrimitiveObj::setPos(glm::vec3 in){
boost::try_mutex::scoped_try_lock lock(myMutex);
if ( lock)
{
position = in;
return 1;
}
return 0;
}
glm::vec3 PrimitiveObj::getPos(){
boost::try_mutex::scoped_try_lock lock(myMutex);
if ( lock)
{
glm::vec3 curPos = position;
return curPos;
}
return glm::vec3(0,0,0);
}
This is the function im using to generate each primitiveobj. (updated)
void generatePrimitive(){
PrimitiveObj *obj = new PrimitiveObj();
obj->generate();
obj->setPos(getPlayerPos()+getEye()*4.0f);
prims.push_back(std::shared_ptr<PrimitiveObj>(obj));
}
Any ideas?
Edit: New functions(2), and myMutex is now private to the object. Added the function i use to generate the primitiveobjects.
Edit:
This is the code that the stack is pointing at, and this is running inside the physics thread:
nr = getNumberOfPrimitives();
double currentTime = glfwGetTime();
float deltaTime = float(currentTime - lastTime);
for(int r = 0; r < nr; r++) {
prop = getPrimitive(r);
glm::vec3 pos = prop->getPos()+glm::vec3(0,1.0f*Meter/deltaTime,0);
prop->setPos(pos);
}
Other relevant code:
int getNumberOfPrimitives(){
return prims.size();
}
PrimitiveObj * getPrimitive(int input) {
return prims[input];
}

The first idea is that your PrimitiveObj that you are calling is uninitialized, something like this:
PrimitiveObj* myObject;
myObject->getPos();
The exception you have is most likely you accessing an uninitialized pointer variable (set to 0xdddddddd so the developer recognizes it as uninitialized) and accessing a member on it that is offset by 0x10 (=16) bytes.
Access Exceptions can also happen if you access objects such as std:vector while reading and writing from different threads to the same object at the same time, but the location is often a more random looking number that starts with zeros and is divisible by 4 (e.g. 0x004da358).
Why is that the case? Debug code often initializes memory with some recognizable yet random numbers (0xdddddddd, 0xbaadfood, 0xfefefefe, etc). They are random because if the variables would always be the same, e.g. always initialized to 0, which could cause the developer to miss the fact that some variables are not initialized and the code would stop working in release. They are easy to recognize so we can tell at a glance that the number comes from uninitialized memory.
Formerly valid pointers point to the heap address space, which usually starts from a somewhat low number and counts up. If multiple objects are allocated on the heap, in normal operation each object is aligned, on a memory address divisible by 4, 8, 16, etc. the members of an object are aligned on 4 byte boundaries as well, that's why access violations caused by accessing formerly valid memory are often on addresses that start with zeros and are divisible by 4.
Keep in mind that these are rules of thumb which can and should be used to point you in the right direction, but they are not hard and fast rules. Also, they refer to debug environments. Release environments have very different rules to guessing which Access Violation is caused by what.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js