Memory error: Dereference null pointer/ SSE misalignment - c++

I'm compiling a program on remote linux server. The program compiled. However when I run it the program ends abruptly. So I debugged the program using DDT. It spits out the following error:
Process 0:
Memory error detected in ClassName::function (filename.cpp:6462).
Thread 1 attempted to dereference a null pointer or execute an SSE instruction with an
incorrectly aligned memory address (the latter may sometimes occur spuriously if guard
pages are enabled)
Tip: Use the stack list and the local variables to explore your program's current
state and identify the source of the error.
Can anyone please tell me what exactly this error means?
The line where the program stops looks like this:
SumUtility = ParaEst[0] + hhincome * ParaEst[71] + IsBlack * ParaEst[61] + IsBachAss * (ParaEst[55]);
This is within a switch case.
These are the variable types
vector<double> ParaEst;
double hhincome;
int IsBlack, Is BachAss;
Thanks for the help!

It means that:
ParaEst is NULL or a bad Pointer
ParaEst's individual array values are not aligned to 16-byte boundaries, required for SSE.
hhincome, IsBlack, or IsBachAss are not aligned to 16-byte boundaries and are SSE type values.
SumUtility is not aligned to 16-bytes and is a SSE type field.
If you could post the assembly code of the exact line that failed along with the register values of that assembler line, we could tell you exactly which of the above conditions have failed. It would also help to see the types of each variable shown to help narrow root the cause.

Ok... The problem finally got fixed.
The issue was that the expression where the code was breaking down was in a newly defined function. However for some weird reason running the make-file did not incorporate these changes and was still compiling using the previously compiled .o file. This resulted in garbage values being assigned to the variables within this new function. To top things off the program calls this function as a first step. Hence there was this systematic breakdown. The technical aspect of this was what Michael alluded to.
After this I would always recommend to use a make clean option in the make file. The issue of why running the make file is failing to compile the modified source file is an issue that definitely warrants further discussion.
Thanks for the responses!!

Related

What does a dangerous relocation error mean?

I am getting a linking error:
dangerous relocation: l32r: Literal placed after use:
I am still trying to debug; however, I want to better understand this error. I understand what relocation is; however, I am not sure how it can be dangerous and was looking for some clarification. Also, a small code snippet that could generate this type of error would be helpful.
In short, what is "a dangerous relocation"?
This is a two-part answer, as there are really two questions here, one general ("what's a dangerous relocation?") and one specific to the Xtensa ("why can't you have a literal placed after where it's used in the code?").
What's all this dangerous relocation stuff about, anyway?
To understand what a 'dangerous relocation' is, we must first understand what a relocation is. As a compiler is generating an object file from some piece of code, it will need to reference symbols that are defined somewhere else: perhaps in another object file in the link, or perhaps in a shared library. However, the compiler does not know the addresses of external symbols when compiling a given object file. It must emit a relocation to serve as a named placeholder, telling the linker "OK, shove the address of foobar into this spot, and oh, you have to do X, Y, and Z to it to make it fit into the instructions there."
Most of the time, this works without a hitch, you get a binary out of your linker, and Bob's your uncle. When this process breaks down, and the linker cannot make the address of the symbol the compiler gave it fit into the instructions at the site of the relocation, it gives up and tosses out a 'dangerous relocation' message (among others -- the all-too-common 'relocation truncated to fit' pops out of this process as well) to inform the programmer that something has gone terribly wrong.
What's wrong with a literal placed after where it's used?
Now that we know what a generic 'dangerous relocation' is, we can move on to the second half of the error message, namely "l32r: Literal placed after use". The Xtensa uses an instruction known as L32R to load constant values from memory that don't fit into the Xtensa's MOVI immediate load instruction, which has a 12-bit signed immediate field. The L32R instruction is described in the Xtensa ISA reference as follows:
L32R is a PC-relative 32-bit load from memory. It is typically used to load constant
values into a register when the constant cannot be encoded in a MOVI instruction.
L32R forms a virtual address by adding the 16-bit one-extended constant value encoded
in the instruction word shifted left by two to the address of the L32R
plus three with the two least significant bits cleared. Therefore, the offset can always
specify 32-bit aligned addresses from -262141 to -4 bytes from the address of the L32R
instruction. 32 bits (four bytes) are read from the physical address. This data is then
written to address register at.
Given the restrictions on L32R quoted above, the error message breaks down quite nicely: the compiler generated a L32R to load a constant (which could be a value or an address) somewhere in your code, but either the constant's value was not available to the compiler (think extern const), or the address needed to be filled in by the linker (this is the likely case). So, it emitted this L32R relocation to tell the linker to 'fill in the blank' in the L32R instruction with the address of a constant value or constant address somewhere in your program. However, the linker couldn't find anywhere in the previous 256KB of code -- or literal pool, depending on how your compiler and Xtensa core are configured -- to shove a constant, so it gave up and spat out the error message you asked about.
How does one fix this?
Unfortunately, a 'dangerous relocation' of this sort depends on code size, so unless you have a bona fide compiler or linker bug on your hands, reproducing it with a small snippet of code will be impossible. There are two possible causes you can try to address, though.
There's no room for my literal pool!
If you are compiling with -mno-text-section-literals (which is the default), the linker gets fed the literal pools as separate sections which it then has to interleave with the code sections. If you have a particularly large object file in your link, it may have over 256KB of code in its .text section, leaving nowhere in the range of a L32R instruction for the linker to place the associated literal pool section at. Compiling with -mtext-section-literals should eliminate the error; if it does not work, you have that flag on already, or if you are using -ffunction-sections (which places each function into its own section; it is sometimes used in embedded work to allow the linker to throw out unused code), read on.
The linker (or assembler) still can't find a place to put my literals!
When the compiler and assembler are told to emit literals into the text section, they restrict placement of the literal pools to before the functions that use them (i.e. before the ENTRY instruction of the function) in order to minimize the risk that the literal pools will be executed as code, with obviously bad results. If you have an extremely long function in your code -- I shudder to think what sort of function could generate more than 256KB of code -- the 'default' literal pool placed before the ENTRY instruction can wind up out of range of L32R instructions near the end of the function. Normally, the compiler will emit an assembler directive known as .literal_position, as well as a jump around the mid-function literal pool, to provide the assembler and linker with an extra place to shove literals into. You can tell the compiler to output an assembler listing using -save-temps and then search it for .literal_position directives; if one isn't present in a function that has L32R instructions past the 256KB mark, congratulations! You just found a compiler bug!
What else could happen to produce this?
The only other circumstance I see that can provoke such a problem is if there is nowhere before the ENTRY instruction that the compiler or linker can put a literal pool, and the compiler can't figure this out on its own -- this can occur with interrupt handlers, or functions that are explicitly placed at the beginning of a physical memory boundary by the linker script. In this case, you will need to insert the .literal_position directive and its associated jump & label by hand in an asm statement at the top of the culprit function in order to provide the assembler with a place to put the culprit function's literals. As the GAS manual puts it:
The assembler will automatically place text section literal pools before ENTRY
instructions, so the .literal_position directive is only needed to specify some other
location for a literal pool. You may need to add an explicit jump instruction to skip
over an inline literal pool.
For example, an interrupt vector does not begin with an ENTRY instruction so the
assembler will be unable to automatically find a good place to put a literal pool.
Moreover, the code for the interrupt vector must be at a specific starting address, so
the literal pool cannot come before the start of the code. The literal pool for the
vector must be explicitly positioned in the middle of the vector (before any uses of the
literals, due to the negative offsets used by PC-relative L32R instructions).
Wait, I'm using the absolute literal option!
If you have the LITBASE option enabled in your Xtensa core and are getting this error, this is a sign that your literal pool has overflowed. The compiler should generate the 'glue' needed to switch literal pools in this case, though: if it doesn't, congratulations! You have just found a compiler bug!
Here's http://www.mail-archive.com/mspgcc-users#lists.sourceforge.net/msg11488.html
This might be helpful for you.
Good luck :)

Access voilation reading location

I am trying to debugg the project on MSVS 2010.
Implementation - c++; when i am degubbing the source code, i get the following failure reported by MSVS.
Failure reported:
"First chance exception at 0x00000013fb5b9ee in unit.exe: 0xc00000005 access voilation reading location 0x00000000000000c."
the problem lies in obtaining address.
int base = (*(abc::g_runc1.m_paulsenderpin.m_lastchunk_p)).xcpp::cxcppoutput::m_baseaddress;
my project is very big to include the source code,
In short it can be described as:
- paul is a module with sender pin connected to c1.
- xcpp is the interface
this source code and the project is correct and works without failure on ARM compiler, but on MSVS it gives access violation error.
On msdn there are some posts about permission set by assembly, and which avoids to read the addressed location. if so, how to change it... ?
or is there any better option to find the problem...?
Any help is appreciated.
Your code is trying to access location that actually isn't owned by it's process. No data of user applications can be located at addresses so close to zero. As your expressions is too long to simply find where is the member containing zero reference, my tip is m_last chunk_p, and the m_baseaddress seems to be member at offset 12.
There is one simple explanation why does your code work fine when it's compiled by something that works with ARM: ARM uses aligned memory access, so class and structure members are aligned to full blocks, although they don't always use whole space allocated for them. Therefore you use bigger pointer or wrong memset parameters somewhere in your code and your pointer gets overwritten.
Problem may also disappear when you compile it with another version of (possibly another) compiler (or non machine with different processor architecture 32/64), as the size of fundamental types isn't always the same.
You should try to check what pointed is actually zero (or possibly 12) in your expression and try to set a watch on it. Be sure you use sizeof properly everywhere.
The Problem lies with the memory addressing, in ARM debugger 32 bits and MSVS10 48 bits of addressing, because of it the MSB byte is lost and so cannot find the correct memory address...!!!

Debugging runtime stack error occurring once end of main function is reached

I've converted some legacy Fortran code to C using the f2c converter (f2c), and I've created a Visual Studio 10 solution on Windows 7 (64-bit). I've also had to link my C++ program (test.cpp, containing my main function) with the f2c library (built on my system using nmake).
The program runs, but once the end of the main function is reached, I receive the following Debug error:
Stack around the variable 'qq' was corrupted
Stack around the variable 'pf' was corrupted
Stack around the variable 'ampls' was corrupted
I am wondering if this might be due to a "correction" made by the f2c converter in the converted C (from Fortran) file:
/* Parameter adjustments */
--x1;
--xabs;
--ximag;
--xreal;
--work4;
--work3;
--work2;
--work1;
--ampls;
--pf;
--qq;
--tri;
This seems a bit odd, since all of these variables are C arrays, and I think that the f2c program is simply doing some pointer arithmetic so that index 0 in the array becomes index 1, in a similar fashion to Fortran.
I don't know if this could also be due to something going wrong with the converted code accessing an element of the array that has not been allocated.
What is the best way to debug this error and fix it?
Potential reasons:
This error is usually related to writing outside the bounds of an array (dynamic or static array). This error can occur by writing\getting a value in a -ve index or index >= size_of_array.
This error also accurs if your pointer is not set to its correct location. (e.g. ptr = 0, ptr = 55, points to deleted (released; or has been free) memory, or any invalid address)
Best way to debug your error in my mind is to debug your prorgam step-by-step and watch those pointer values. There must be some wrong with them.
What you say could be true. I would suggest to create a very small program that uses an array and decrements the pointer exactly as f2c does. Something like
int aa[10];
int *pa = aa;
--pa;
pa[1] = ...
That is, test the suspected code in the small scale. You might isolate the cause to the problem this way. (Finding a workaround is a different story)
Are you compiling with the debug versions of the crt? That might give you some more information.
Also, is it possible that your library is built as C and your application is written as C++?
Those errors you mention are sometimes because of different calling conventions. You do state it's a 64bit application, so it shouldn't be an issue (all 64bit apps use the same calling convention), but it's worth looking into.
Is it possible to add all the fortran converted code to visual studio and not do a seperate make?

Debug stack corruption

Now I am debugging a large project, which has a stack corruption: the application fails.
I would like to know how to find (debug) such stack corruption code with Visual Studio 2010?
Here's an example of some code which causes stack problems, how would I find less obvious cases of this type of corruption?
void foo()
{
int i = 10;
int *p = &i;
p[-2] = 100;
}
Update
Please note that this is just an example. I need to find such bad code in the current project.
There's one technique that can be very effective with these kinds of bugs, but it'll only work on a subset of them that has a few characteristics:
the corrupting value must be stable (ie., as in your example, when the corruption occurs, it's always 100), or at least something that can be readily identified in a simple expression
the corruption has to occur at a particular address on the stack
the corrupting value is unusual enough that you won't be hit with a slew of false positives
Note that the second condition may seem unlikely at first glance because the stack can be used in so many different ways depending on the runtime actions. However, stack usage is generally pretty deterministic. The problem is that a particular stack location can be used for so many different things that the problem is really item #3.
Anyway, if your bug has these characteristics, you should identify the stack address (or one of them) that gets corrupted, then set a memory breakpoint for a write to that address with a condition that causes it to break only if the value written is the corrupting value. In visual Studio, you can do this by creating a "New Data Breakpoint..." in the Breakpoints window then right clicking the breakpoint to set the condition.
If you end up getting too many false positives, it might help to narrow the scope of the breakpoint by leaving it disabled until some point in the execution path that's closer to the bug (if you can identify such a time), or set the hit count high enough to remove most of the false positives.
An additional complication is the address of the stack may change from run to run - in this case, you'll have to take care to set the breakpoint on each run (the lower bits of the address should be the same).
I believe your Questions quotes an example of stack corruption and the question you are asking is not why it crashes.
If it is so, It crashes because it creates an Undefined Behavior because the index -2 points to an unknown memory location.
To answer the question on profiling your application:
You can use Rational Purify Plus for Visual studio to check for memory overrites and access errors.
This is UB: p[-2] = 100;
You can access p with the operator[] in this(p[i]) way, but in this case i is an invalid value. So p[-2] points to an invalid memory location and causes Undefined Behaviour.
To find it you should debug your app and find where it crashes, and hopefully, it'll be at a place where something is actually wrong.

What does "BUS_ADRALN - Invalid address alignment" error means?

We are on HPUX and my code is in C++.
We are getting
BUS_ADRALN - Invalid address alignment
in our executable on a function call. What does this error means?
Same function is working many times then suddenly its giving core dump.
in GDB when I try to print the object values it says not in context.
Any clue where to check?
You are having a data alignment problem. This is likely caused by trying to read or write through a bad pointer of some kind.
A data alignment problem is when the address a pointer is pointing at isn't 'aligned' properly. For example, some architectures (the old Cray 2 for example) require that any attempt to read anything other than a single character from memory only occur through a pointer in which the last 3 bits of the pointer's value are 0. If any of the last 3 bits are 1, the hardware will generate an alignment fault which will result in the kind of problem you're seeing.
Most architectures are not nearly so strict, and frequently the required alignment depends on the exact type being accessed. For example, a 32 bit integer might require only the last 2 bits of the pointer to be 0, but a 64 bit float might require the last 3 bits to be 0.
Alignment problems are usually caused by the same kinds of problems that would cause a SEGFAULT or segmentation fault. Usually a pointer that isn't initialized. But it could be caused by a bad memory allocator that isn't returning pointers with the proper alignment, or by the result of pointer arithmetic on the pointer when it isn't of the correct type.
The system implementation of malloc and/or operator new are almost certainly correct or your program would be crashing way before it currently does. So I think the bad memory allocator is the least likely tree to go barking up. I would check first for an uninitialized pointer and then bad pointer arithmetic.
As a side note, the x86 and x86_64 architectures don't have any alignment requirements. But, because of how cache lines work, and for various other reasons, it's often a good idea for performance to align your data on a boundary that's as big as the datatype being stored (i.e. a 4 byte boundary for a 32 bit int).
Most processors (not x86 and friends.. the blacksheep of the family lol) require accesses to certain elements to be aligned on multiples of bytes. I.e. if you read an integer from address 0x04 that is okay, but if you try to do the same from 0x03 you will cause an interrupt to be thrown.
This is because it's easier to implement the load/store hardware if it's always on a multiple of the data size with which you're working.
Since HP-UX runs only on RISC processors, which typically have such constraints, you should see here -> http://en.wikipedia.org/wiki/Data_structure_alignment#RISC.
Actually HP-UX has its own great forum on ITRC and some HP staff members are very helpful. I just took a look at the same topic you are asking and here are some results. For example the similar problem was caused actually by a bad input parameter. I strongly advise you first to read answers to similar question and if necessary to post your question there.
By the way it is likely that you will be asked to post results of these gdb commands:
(gdb) bt
(gdb) info reg
(gdb) disas $pc-16*8 $pc+16*4
Most of these issues are caused by multiple upstream dependencies linking to different versions of the same library.
For example, both the gnustl and stlport provide distinct implementations of the C++ standard library. If you compile and link against gnustl, while one of your dependencies was compiled and linked against stlport, then you will each have a different implementation of standard functions and classes. When your program is launched, the dynamic linker will attempt to resolve all of the exported symbols, and will discover known symbols at incorrect offsets, resulting in the BUS_ADRALN signal.