How do I trace a potential undefined behavior situation? - fortran

I get a double free error on deallocate, when the pointer is associated. I suspect some undefined behavior is ongoing, but I have no idea where to start looking for it, or how. How do you track down undefined behavior ?
The compiler is intel 12. I can't post code because it's huge, and I am not even sure the source of problem is in my code. It may be in a colleague's library. I tried to do some debugging with gdb, but I don't get very far. This is the error.
malloc: * error for object 0x102302f20: pointer being freed was not allocated
* set a breakpoint in malloc_error_break to debug
The line immediately before that is a print *, associated(pointer), which prints true.
This is the backtrace
#0 0x00007fff9327b6c0 in malloc_error_break ()
#1 0x00007fff9327b805 in free ()
#2 0x0000000100d27470 in for_dealloc_allocatable ()
#3 0x0000000100506699 in sharedarraysmodule_mp_deleterealsharedarray2_ () at SharedArrays.f90:609
#4 0x00000001003bbc4e in gammaaggregatormodule_mp_deleteprivate_ () at GammaAggregator.f90:86
#5 0x0000000102300bc0 in ?? ()
Previous frame inner to this frame (gdb could not unwind past this frame)

Long comment, might point (sorry, couldn't resist) towards an answer ...
To be pedantic, it looks as if your compiler's best guess is 'pointer being freed was not allocated'. But in Fortran pointers only have statuses undefined, associated and disassociated so the error message may arise out of calls to the system functions which the compiler links into your executable.
That being said, I'd look for:
A target going out of scope while a pointer to it remains in scope. The Fortran 2003 standard states (note 16.13)
A pointer from a module program unit may be accessible in a subprogram via use association.
Such pointers have a lifetime that is greater than targets that are declared in the subprogram,
unless such targets are saved. Therefore, if such a pointer is associated with a local target, there is
the possibility that when a procedure defined by the subprogram completes execution, the target
will cease to exist, leaving the pointer “dangling”. This standard considers such pointers to have
an undefined association status. They are neither associated nor disassociated. They shall not be
used again in the program until their status has been reestablished. There is no requirement on a
processor to be able to detect when a pointer target ceases to exist.
The same document also states that the association status of the pointer passed to the associated intrinsic shall not be undefined so it's probably allowable for the program to lie and tell you that associated(undefined_pointer)==.true..
An allocatable target being deallocated, either by going out of scope or as a result of a deallocate() call.
Pointers being nullified on declaration, eg real, pointer :: rptr => null() (this is generally held to be a good thing, you might look to ensure that your code conforms).
Pointers being nullified before they have been associated, though I don't think that nullifying a null pointer is an error according to the standard.
Pointers to pointers.
The error messages you report tend to suggest that you have a case of module variables going out of scope, but, as you've already observed, it's kind of difficult to be certain.
If this doesn't help, contact Intel tech support, I find them very helpful, they are pretty good at spotting problems in one's codes.

Related

NULL function pointers

What is the behavior of calling a null function pointer?
void (*pFunc)(void) = NULL;
pFunc();
Why is it advisable to initialize yet unused function pointers to NULL?
In C and C++, this is called undefined behaviour, meaning that this can lead to a Segmentation fault, nothing or whatever such a case will cause based on your compiler, the operating system you're running this code on, the environment (etc...) means.
Initializing a pointer to a function, or a pointer in general to NULL helps some developers to make sure their pointer is uninitialized and not equal to a random value, thereby preventing them of dereferencing it by accident.
What happnes when u try to access NULL?
Following is true about data as well as code, and this is what
happens when you try to read NULL(or any address from 0 to
4096,i.e atleast first page of segment). Root cause of this lies in
OS and microprocessor segmentation/paging architecture
When you try to access NULL( or 0) address, in any of data or code
section, it causes segmentation fault(which is actually a killer
page fault). First page of section is treated as out of( or invalid
part of) virtual address space. That is purposefully that first page
is kept invalid( or not present) so atleast one address that pointer
contains could be represented as invalid in program at execution
time.
Page descriptor of the 1st page(which contains virtual address 0,
NULL), has first bit "present" as 0 (means its invalid page). Now if
you try to access NULL pointer(0 address) it will cause to raise a
page fault as page is not present, and OS will try to handle this
page fault. When page fault handler see that its trying to access
1st page, which is treated as a invalid part of virtual address
space it kills the process. This is all about user space process. If
you try to access NULL pointer in system process(kernel level code),
it will fail your OS an crash the system.
Links: http://en.wikipedia.org/wiki/Page_fault#Invalid
http://en.wikipedia.org/wiki/Memory_protection#Paged_virtual_memory
http://pdos.csail.mit.edu/6.828/2005/readings/i386/s05_02.htm
Above is sufficient bt as i think u should read this as well
http://www.iecc.com/linker/linker04.txt
Why function pointer is initialized to NULL?
Although if you try to call the with NULL its going to give page/segment fault. NULL signifies its invalid function. If it
contains any garbage address but in valid virtual address space of
code section, i think any code at that address will be called, which
could be even more disaster(spl in case of real time systems).
Initialize funcp = funct_foo_name + 1; now call function using
function pointer. Function pointer points to valid virtual address
space of code section. bt function will start from incorrect place
to execute. which could result into wrong code execution or wrong
order.
It's advisable for the same reason as initializating "normal" (data) pointers to NULL: because it potentially makes some errors easier to track down. Opinions on whether this is useful or not of course vary :-)

C++ function used to work, now returning 0xfdfdfdfd

I have some code I wrote a few years ago. It has been working fine, but after a recent rebuild with some new, unrelated code elsewhere, it is no longer working. This is the code:
//myobject.h
...
inline CMapStringToOb* GetMap(void) {return (m_lpcMap);};
...
The above is accessed from the main app like so:
//otherclass.cpp
...
CMapStringToOb* lpcMap = static_cast<CMyObject*>(m_lpcBaseClass)->GetMap();
...
Like I said, this WAS working for a long time, but it's just decided to start failing as of our most recent build. I have debugged into this, and I am able to see that, in the code where the pointer is set, it is correctly setting the memory address to an actual value. I have even been able to step into the set function, write down the memory address, then move to this function, let it get 0xfdfdfdfd, and then manually get the memory address in the debugger. This causes the code to work. Now, from what I've read, 0xfdfdfdfd means guarding bytes or "no man's land", but I don't really understand what the implications of that are. Supposedly it also means an off by one error, but I don't understand how that could happen, if the code was working before.
I'm assuming from the Hungarian notation that you're using Visual Studio. Since you do know the address that holds the map pointer, start your program in the debugger and set a data breakpoint when that map pointer changes (the memory holding the map pointer, not the map pointed to). Then you'll find out exactly when it's getting overwritten.
0xfdfdfdfd typically implies that you have accessed memory that you weren't supposed to.
There is a good chance the memory was allocated and subsequently freed. So you're using freed memory.
static_cast can modify a pointer and you have an explicit cast to CMyObject and an implicit cast to CMapStringToOb. Check the validity of the pointer directly returned from GetMap().
Scenarios where "magic" happens almost always come back to memory corruption. I suspect that somewhere else in your code you've modified memory incorrectly, and it's resulting in this peculiar behavior. Try testing some different ways of entering this part of the code. Is the behavior consistent?
This could also be caused by an incorrectly built binary. Try cleaning and rebuilding your project.

Is there anyway a valgrind message "Conditional jump or move depends on uninitialized value" can be a so called 'false positive'

Most questions I find here provide a piece of code and get answered by someone pointing to the actual error. My question is about conditional jumps on uninitialized values in general. I can understand that a piece of memory should not necessarily be cleaned at the end of a program if one is sure this allocation is done only once and will probably be needed during the lifetime of a program. As far as I remember the GType system leaves a lot of unfreed memory when the program terminates. These unfreed blocks can be seen as 'false positives'. But can a 'conditional jump or move on uninitialized value' be a false positive? The only thing I can come up with is someone implementing a (bad) randomize function by just reading a random address (where the random address itself is the tricky part ;). Another example could be hardware mapped to a part of the memory which is then read, but this is mostly done by drivers and not by normal user applications. Is there any other example (preferably C) which could cause such a false positive?
What valgrind is reporting is that it sees a jump based on a read from a location for which it knows that it was allocated by the program but for which it hasn't seen an initialization. This might happen if the object is initialized by some magic that valgrind doesn't know about. Architectures evolve constantly and maybe you have an instruction or register type that valgrind doesn't know enough about.
Another difficult source of such non-initializations are unions. Two sources:
Per default, for these only the first member is initialized and so
when another field goes beyond that first member that part might be
uninitialized.
If the members of the union are struct they may have padding
bytes at different places, and so part of a member may be
uninitialized if you assigned to a different member.
In some cases it might be legitimate to even read these things (through a unsigned char[] for example) so if you consider such things as a bug (false positive) or not is a matter of perspective.
Absolutely! I once had C code of the form
// compute a and, possibly, b
if (a && b) {
// do stuff
}
in which b was guaranteed to be initialized if a were true. Thus, there was no way that an uninitialized value of b could cause a problem. However, gcc, when optimizing sufficiently aggressively, decided to check the value of b first. This was acceptable since neither check had any side effects, but it still caused valgrind to complain.

Access violation exception when calling a method

I've got a strange problem here. Assume that I have a class with some virtual methods. Under a certain circumstances an instance of this class should call one of those methods. Most of the time no problems occur on that stage, but sometimes it turns out that virtual method cannot be called, because the pointer to that method is NULL (as shown in VS), so memory access violation exception occurs. How could that happen?
Application is pretty large and complicated, so I don't really know what low-level steps lead to this situation. Posting raw code wouldn't be useful.
UPD: Ok, I see that my presentation of the problem is rather indefinite, so schematically code looks like
void MyClass::FirstMethod() const { /* Do stuff */ }
void MyClass::SecondMethod() const
{
// This is where exception occurs,
// description of this method during runtime in VS looks like 0x000000
FirstMethod();
}
No constructors or destructors involved.
Heap corruption is a likely candidate. The v-table pointer in the object is vulnerable, it is usually the first field in the object. A buffer overflow for some kind of other object that happens to be adjacent to the object will wipe the v-table pointer. The call to a virtual method, often much later, will blow.
Another classic case is having a bad "this" pointer, usually NULL or a low value. That happens when the object reference on which you call the method is bad. The method will run as usual but blow up as soon as it tries to access a class member. Again, heap corruption or using a pointer that was deleted will cause this. Good luck debugging this; it is never easy.
Possibly you're calling the function (directly or indirectly) from a constructor of a base class which itself doesn't have that function.
Possibly there's a broken cast somewhere (such as a reinterpret_cast of a pointer when there's multiple inheritance involved) and you're looking at the vtable for the wrong class.
Possibly (but unlikely) you have somehow trashed the vtable.
Is the pointer to the function null just for this object, or for all other objects of the same type? If the former, then the vtable pointer is broken, and you're looking in the wrong place. If the latter, then the vtable itself is broken.
One scenario this could happen in is if you tried to call a pure virtual method in a destructor or constructor. At this point the virtual table pointer for the method may not be initialized causing a crash.
Is it possible the "this" pointer is getting deleted during SecondMethod's processing?
Another possibility is that SecondMethod is actually being called with an invalid pointer right up front, and that it just happens to work (by undefined behavior) up to the nested function call which then fails. If you're able to add print code, check to see if "this" and/or other pointers being used is something like 0xcdcdcdcd or 0xfdfdfdfd at various points during execution of those methods. Those values are (I believe) used by VS on memory alloc/dealloc, which may be why it works when compiled in debug mode.
What you are most likely seeing is a side-effect of the actual problem. Most likely heap or memory corruption, or referencing a previously freed object or null pointer.
If you can consistently have it crash at the same place and can figure out where the null pointer is being loaded from then I suggest using the debugger and put a breakpoint on 'write' at that memory location, once the breakpoint is trigerred then most likely you are viewing the code that has actually caused the corruption.
If memory access violation happens only when Studio fails to show method address, then it could be caused by missing debug information. You probably are debugging the code compiled with release (non-debug) compiler/linker flags.
Try to enable some debug info in C++ properties of project, rebuild and restart debugger. If it will help, you will see all normal traceable things like stack, variables etc.
If your this pointer is NULL, corruption is unlikely. Unless you're zeroing memory you shouldn't have.
You didn't say if you're debugging Debug (not optimized) or Release (optimized) build. Typically, in Release build optimizer will remove this pointer if it is not needed. So, if you're debugging optimized build, seeing this pointer as 0 doesn't mean anything. You have to rely on the deassembly to tell you what's going on. Try turning off optimization in your Release build if you cannot reproduce the problem in Debug build. When debugging optimized build, you're debugging assembly not C++.
If you're already debugging a non-optimized build, make sure you have a clean rebuild before spending too much time debugging corrupted images. Debug builds are typically linked incrementally and incremental linker are known to produce problems like this. If you're running Debug build with clean build and still couldn't figure out what went wrong, post the stack dump and more code. I'm sure we can help you figure it out.

Thought experiment with __stdcall and corrupted stack (C++)

My mind was wandering today on the topic of function pointers, and I came up with the following scenario in my head:
__stdcall int function (int)
{
return 0;
}
int main()
{
(*(int(*)(char*,char*))function)("thought", "experiment");
return 0;
}
AFAIK this code would corrupt the stack, so what types of issues could I be looking at if I ran this code?
I'd do this investigating myself however I'm away from my dev machine for a week.
EDIT: Hold on a second, I've been thinking a bit more. As has been observed in the comments, the intent of this code was to have a parameter left on the stack when all is said and done (caller puts two params on the stack, callee -- expecting only one param -- pops only one off). However, since my cast doesn't make mention of the calling convention, am I casting away stdcall, at least from the view of the caller? int function(int) will still pop a param off the stack, but does the caller revert to thinking the function is __cdecl (the default) because of the cast? (i.e. three total params popped?)
EDIT2: The answer to that second question, as confirmed by Rob, is yes. I would have to restate __stdcall if I wanted to leave a param on the stack:
(*(__stdcall int(*)(char*,char*))function)("thought", "experiment");
You are calling the function as if it is _cdecl which means the caller pushes the arguments and cleans up the stack.
The receiving function is _stdcall which implies the callee cleans up the stack. The callee is expecting a single argument so will pop 4 bytes off the stack.
When the function returns the caller will then pop off two pointers (having previously pushed on two pointers), so your stack is being corrupted by 4 bytes.
Both calling conventions use the same return mechanism, and have the same register rules (eax, ecx and edx are not preserved). See wikipedia for more details.
Depending on the stack frame layout and alignment this mismatch could cause a number of effects. If you are lucky then you get away with it. If not you might mess up the return address of your main function, causing the program to crash when it branches to who-knows-where. If the compiler has injected some kind of stack guard to catch corruption then it will likely detect this and abort the program.
No, it will definitely not cause a blue screen. No user-mode process is able to do that. Even if such bug were in kernel-mode code, the BSOD would occur only after accessing invalid memory or passing wrong arguments to a function.
You are simply corrupting private memory of your process, and the corruption may (or may not) later result in an invalid operation (eg. dereferencing a pointer pointing to invalid memory). When this happens, the OS terminates your process, but no sooner.
I think you would have 'undefined behavior' in this case.
From the C standard: (I would assume it's the same in C++)
768 If
a converted pointer is used to call a
function whose type is not compatible
with the pointed-to type, the behavior
is undefined.
Edit: On most operating system, this type of error would not cause problems in your whole operating system. But it would cause undefined problems in your program. It would be very hard for a user mode program to be able to cause a blue-screen.