Segmentation fault on valid memory - c++

I get segmentation fault accessing an object which looks valid and fully accessible in gdb. Isn't segmentation is always about inaccessible memory?
EDIT: more details.
The crash happend under gdb so I could examine the object's memory. It had the members set to proper values so there is no chance I was accessing read-only memory. The instruction where crashed happed is kind of Var = Obj.GetMember() where Var, GetMember and the corresponding member are short integers.
Misalignment? I suppose it would cause bus error, not segmentation. I'll try to rebuild all. The problem is that this piece of code runs thousands times a second and the segmentation happens once in several days.

Try complete rebuild (make clean && make), this had helped me a couple of times when I encountered such weird errors.
Late UPD:
If this does fix the problem, it usually means that something is wrong with your makefile, usually screwed-up dependencies between .cpp and .h files, for example: a.cpp includes b.h, but b.h is not listed in a.cpp's dependencies.

You can get faults even if accessing "valid" memory under some circumstances:
you're attempting to modify memory but the specific mapping is readonly
you're attempting to execute code in a memory area that is no-execute
you're attempting to e.g. load/store at a misaligned address and your hardware issues alignment exceptions
Without a look at the coredump, to figure out what the faulting instruction (load/store/execute) was and what exactly the mapping permissions for the accessed memory were it's impossible to distinguish.

Basically, yes. Did you use the core dump to analyze your seg fault?

Code would very much help, but have you done a make clean? If you've increased the size of a class and your dependencies aren't right then there won't be enough space allocated for an instance and that class will then overrun and corrupt whatever it precedes in memory.

Related

Segmentation fault on one Linux machine but not another with C++ code

I have been having a peculiar problem. I have developed a C++ program on a Linux cluster at work. I have tried to use it home on an Ubuntu 14.04 machine, but the program, which is composed of 6 files: main.hpp,main.cpp (dependent on) sarsa.hpp,sarsa.cpp (class Sarsa) (dependent on) wec.hpp,wec.cpp, does compile, but when I run it it either returns segmenation fault or does not enter one fundamental function of the class Sarsa.
The main code calls the constructor and setter functions without problems:
Sarsa run;
run.setVectorSize(memory,3,tilings,1000);
etc.
However, it cannot run the public function episode , since learningRate, which should contain a large integer, returns 0 for all episodes (iterations).
learningRate[episode]=run.episode(numSteps,graph);}
I tried to debug the code with gdb, which has returned:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000408f4a in main () at main.cpp:152
152 learningRate[episode]=run.episode(numSteps,graph);}
I also tried valgrind, which returned:
==10321== Uninitialised value was created by a stack allocation
==10321== at 0x408CAD: main (main.cpp:112)
But no memory leakage issues.
I was wondering if there was a setting to try to debug the external file sarsa.cpp, since I think that class is likely to be the culpript
In the file, I use C++v11 language (I would be expecting errors at compile-time,though), so I even compiled with g++ -std=c++0x, but there were no improvements.
Unluckily, because of the size of the code, I cannot post it here. I would really appreciate any help with this problem. Am I missing anything obvious? Could you help me at least with the debugging?
Thank you in advance for the help.
Correction:
main.cpp:
Definition of the global array:
`#define numEpisodes 10
int learningRate[numEpisodes];`
Towards the end of the main function:
for (int episode; episode<numEpisodes; episode++) {
if (episode==(numEpisodes-1)) { // Save the simulation data only at the
graph=true;} // last episode
learningRate[episode]=run.episode(numSteps,graph);}
As the code you just added to the question reveals, the problem arises because you did not initialize the episode variable. The behavior of any code that uses its value before you assign one is undefined, so it is entirely reasonable that the program behaves differently in one environment than in another.
A segmentation fault indicates an invalid memory access. Usually this means that somewhere, you're reading or writing past the end of an array, or through an invalid pointer, or through an object that has already been freed. You don't necessarily get the segmentation fault at the point where the bug occurs; for instance, you could write past the end of an array onto heap metadata, which causes a crash later on when you try to allocate or release an unrelated object. So it's perfectly reasonable for a program to appear to work on one system but crash on another.
In this case, I'd start by looking at learningRate[episode]. What is the value of episode? Is it within the bounds of learningRate?
I was wondering if there was a setting to try to debug the external file sarsa.cpp, since I think that class is likely to be the culpript
It's possible to set breakpoints in functions other than main.cpp.
break location
Set a breakpoint at the given location, which can specify a function name, a line number, or an address of an instruction.
At least, I think that's your question. You'll also need to know how to step into functions.
More importantly, you need to learn what your tools are trying to tell you. A segfault is the operating system's reaction to an attempt to dereference memory that doesn't belong to you. One common reason for that is trying to dereference NULL. Another would be trying to dereference a pointer that was never initialized. The Valgrind error message suggests that you may have an unitialized pointer.
Without the code, I can't tell you why the pointer isn't initialized when you run the program on your home system, but is (apparently) initialized when you run it at work. I suspect that you don't have the necessary data on your home system, but you'll need to investigate and figure that out. The fundamental question to keep asking yourself is "what is different between my home computer an dmy work computer?"

C++ - Debug Assertion Failed on program exit

When calling 'exit', or letting my program end itself, it causes a debug assertion:
.
Pressing 'retry' doesn't help me find the source of the problem. I know that it's most likely caused by memory being freed twice somewhere, the problem is that I have no clue where.
The whole program consists of several hundred thousand lines which makes it fairly difficult to take a guess on what exactly is causing the error. Is there a way to accurately tell where the source of the problem is located without having to comb line for line through the code?
The callstack doesn't really help either:
.
this kind of errors usally appear if you delete already deleted objects.
this happens if one object is given to multiple other objects that are supposed to take ownership of that first object and both try to delete it in their destructor.
As the messagebox already suggests, you're probably corrupting your heap in some way. Either you free/delete some memory block that you should not or you're trying to write to some memory block that has already been freed/deleted.
The callstack suggests that this is probably happening when stepping over the last line of your main function. If that is the case, then the problem is probably in the cleanup routines of some user defined types of which you create instances inside the main function. Try setting breakpoints inside the destructors of your own classes and investigate.
It is possible that you are corrupting the heap during the program operation, but it is not being detected until the end of the program, in which case the stack trace would only point to the memory checking routine
There may be a function you can call during operation that checks if the heap is valid, which may bring the fail closer to the point of corruption
HeapValidate is an example of such a routine, but it would depend on what platform you are using
This error can also happen when you use delete[] instead of delete. However, as mentioned, this is only one of many causes.

Compare pointer to given value in C++

I'm on amd64 architecture. The following code gets rejected by the g++ compiler at the "if" statement:
void * newmem=malloc(n);
if(newmem==0xefbeaddeefbeadde){
with the error message:
error: ISO C++ forbids comparison between pointer and integer [-fpermissive]
I can't seem to find the magic incantation needed to get it going (and I don't want to use -fpermissive). Any help appreciated.
Background:
I'm hunting an ugly bug which crashes my program while requesting memory in some STL new operation (at least gdb told me that). Thinking it could be some memory overrun of one of the allocated memory chunks being, by bad luck, adjacent to memory used by the OS to manage memory lists of my program, I quickly overrode new(), new plus their delete counterparts with own routines that added memory fencing; and while the application still crashes (all fences intact (sigh)), gdb now told me this:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000604f5e in construct (__val=..., __p=0xefbeaddeefbeadde, this=<optimized out>)
at /usr/include/c++/4.6/ext/new_allocator.h:108
108 { ::new((void *)__p) _Tp(__val); }
I noted that the pointer __p has as value the pointer representation of the value I used for one of my memory fences (0xdeadbeef), hence my wish to catch this earlier in my new() to try and dump out some more complex values in my program.
Additional note: the function where it crashes runs flawlessly a couple of million times before it crashes (intermixed with dozens of other routines which all also run a couple of thousand to million times), using valgrind does not seem like an option atm because it takes 6hrs and 11 Gb before my program crashes.
One of them is an integer, and the other a pointer. Perhaps an ugly cast would do
if(newmem==(void*)0xefbeaddeefbeadde){
For your question: you have 2 options - cast the pointer to 64bit number or cast the number to void*
For the background: as this takes too long, it could be a memory leak. So, try valgrind - no need to wait for crash, just run your program through valgrind and use the appropriate options. valgrind could help you to catch undefined behavior, too.
Another thing - compile your program with -O0 optimization level and with max level of debug symbols - -ggdb3. Also, if your executable does not generate a core dump, when this error occurs, make it generate. Then leave it working, while you're trying different approaches to catch the error.
In case you don't succeed, you'll at least have a (hopefully) nice core dump, that you can examine with gdb your_exe the_core_dump. Then, watch the core frame by frame, watch the variables, etc.
If it's not a memory leak, it could be a memory corruption somewhere, undefined behavior or something like this. If a good core dump is generated, you'll probably catch the error. Or, at least, it could give you useful information about the case.
Another thing, that could be useful, if you generate a core dump (no matter if it's a good or a bad one) - the size of the dump. If it's too large (the definition of "large" this depends on your program), then you most probably have a memory leak (it could be some kind of cross-reference issue, if you use smart pointers and if so - valgrind will not catch it)
That probably won't work that way. Next time when your program runs, malloc may never return that value. There is something else you must do, rather than just checking on the return value of new!

Variable Name from Memory address in Visual Studio 2008

My c++ application developed in Visual Studio 2008 crashes at some memory location. It is showing memory access violation erroe like,
Unhandled exception at 0x650514aa in XXX.exe: 0xC0000005: Access violation reading location 0x00000004.
how I can get the variable name assigned to 0x650514aa this memory location. Or how to debug this issue.
Thanks,
Nilesh
0x650514aa is the address of code (instruction pointer), not of a variable. If you're lucky, it's your code. Then a map file would help. If you're unlucky, it's some third-party code (blowing because you called it passing in nonsense). But it ain't pretty to dig through map files and it won't tell you your variables' values anyway.
However, if you run it from the debugger, the debugger should intercept and let you examine the stack. And even if you run it without debugger, just-in-time debugging should pop up a dialog asking whether you want to attach a debugger.
The other answers here have useful information. But if for some reason you cannot get the debugger to assist you, here's some more detail you can get out of that error message that may help you locate the problem.
Access violation reading location 0x00000004.
That memory location is what the program is incorrectly trying to read. Typically, reading or writing to a very low memory location like that is caused by code trying to use a NULL pointer as if it pointed to a valid object.
If you happen to know roughly what part of your program is executing when this error occurs, then examine it for any possibilities for NULL pointers to slip through unexpectedly.
Furthermore, 0x00000004 would be the location of a member variable 4 bytes from the start of the object. If the object has virtual functions, then it would probably be the first member variable in the object (because those first 4 bytes are the hidden pointer to the virtual function table). Otherwise, without virtual functions involved, there must be 4 bytes worth of other member variables and/or padding bytes before it. So if you can't immediately tell which pointer is going NULL and causing the problem, then consider which pointers are being used to read such a member variable.
(Note: Technically, the exact memory layout of non-POD objects, particularly when virtual functions are involved, is not guaranteed by any standard. Byte alignment settings in your project can also affect memory layouts. However, in this case it's fairly safe to assume that what I've described is what your compiler is actually doing.)
Usually, if you debug your application inside Visual Studio 2008, at the time of the crash it will stop right at the offending line. Be sure to compile in the Debug configuration, then click Debug | Start.
For further checking, you can go to Debug | Exceptions and check the boxes "Break when an exception is thrown".
If you're running in debug, you should be able to have the system break at that point and be able to see the source code.
If you're running in release mode however, you may need to use the .map file that can be generated. (Link switch /MAP, and you'll need to specify the export files too)
There's a description of how to for v6 here: http://www.codeproject.com/KB/debug/mapfile.aspx
2008 is pretty similar, I believe, although I tend to prefer to run in debug mode if possible.
The map file will allow you to translate your crash address into an exact location in the source code (line number), which may well be helpful. However it will only tell you where the error manifested - not what actually caused it (e.g. a stack corruption wouldn't tell you when you corrupted the stack, only when the corrupted stack was discovered.)
Still, it should help point you in the right direction.

Dereferencing deleted pointers always result in an Access Violation?

I have a very simple C++ code here:
char *s = new char[100];
strcpy(s, "HELLO");
delete [] s;
int n = strlen(s);
If I run this code from Visual C++ 2008 by pressing F5 (Start Debugging,) this always result in crash (Access Violation.) However, starting this executable outside the IDE, or using the IDE's Ctrl+F5 (Start without Debugging) doesn't result in any crash. What could be the difference?
I also want to know if it's possible to stably reproduce the Access Violation crash caused from accessing deleted area? Is this kind of crash rare in real-life?
Accessing memory through a deleted pointer is undefined behavior. You can't expect any reliable/repeatable behavior.
Most likely it "works" in the one case because the string is still "sitting there" in the now available memory -= but you cannot rely on that. VS fills memory with debug values to help force crashes to help find these errors.
The difference is that a debugger, and debug libraries, and code built in "debug" mode, likes to break stuff that should break. Your code should break (because it accesses memory it no longer technically owns), so it breaks easier when compiled for debugging and run in the debugger.
In real life, you don't generally get such unsubtle notice. All that stuff that makes things break when they should in the debugger...that stuff's expensive. So it's not checked as strictly in release. You might be able 99 times out of 100 to get away with freeing some memory and accessing it right after, cause the runtime libs don't always hand the memory back to the OS right away. But that 100th time, either the memory's gone, or another thread owns it now and you're getting the length of a string that's no longer a string, but a 252462649-byte array of crap that runs headlong into unallocated (and thus non-existent, as far as you or the runtime should care) memory. And there's next to nothing to tell you what just happened.
So don't do that. Once you've deleted something, consider it dead and gone. Or you'll be wasting half your life tracking down heisenbugs.
Dereferencing a pointer after delete is undefined behavior - anything can happen, including but not limited to:
data corruption
access violation
no visible effects
exact results will depend on multiple factors most of which are out of your control. You'll be much better off not triggering undefined behavior in the first place.
Usually, there is no difference in allocated and freed memory from a process perspective. E.g the process only has one large memory map that grows on demand.
Access violation is caused by reading/writing memory that is not available, ususally not paged in to the process. Various run-time memory debugging utilities uses the paging mechanism to track invalid memory accesses without the severe run time penalty that software memory checking would have.
Anyway your example proves only that an error is sometimes detected when running the program in one environment, but not detected in another environment, but it is still an error and the behaviour of the code above is undefined.
The executable with debug symbols is able to detect some cases of access violations. The code to detect this is contained in the executable, but will not be triggered by default.
Here you'll find an explanation of how you can control behaviour outside of a debugger: http://msdn.microsoft.com/en-us/library/w500y392%28v=VS.80%29.aspx
I also want to know if it's possible
to stably reproduce the Access
Violation crash caused from accessing
deleted area?
Instead of plain delete you could consider using an inline function that also sets the value of the deleted pointer to 0/NULL. This will typically crash if you reference it. However, it won't complain if you delete it a second time.
Is this kind of crash rare in
real-life?
No, this kind of crash is probably behind the majority of the crashes you and I see in software.