C++ - How to debug SIGILL ILL_ILLOPN - c++

Recently I ran into a crash while the following statement is getting executed
static const float kDefaultTolerance = DoubleToFloat(0.25);
where DoubleToFloat is defined as below
static inline float DoubleToFloat(double x){
return static_cast<float>(x);
}
And the log statements shows below
09-04 01:08:50.727 882 882 F DEBUG : signal 4 (SIGILL), code 2 (ILL_ILLOPN), fault addr 0x7f9e3ca96818
when I read about SIGILL, I understand that it happens when process encounters to run an invalid operation. So I think compiler (clang in my case) is generating some junk code while translating the above snippet. How to check what is compiler generating and see what is going wrong in this particular case? Also suggest me if there are any tools to debug these kind of issues.

I have a similar problem today.Finally, I found the reason for the problem is that AVX instruction set is used in floating-point operation, but the computer does not support AVX instruction set. You can try to use SSE Instruction set.

Related

How can I debug Eigen alignment errors when they seem unrelated to the exact code which triggers them

I am writing code which uses the Eigen matrix library for coordinate transforms, and also PCL for point cloud processing (which also uses Eigen a lot). I keep getting assertion errors from Eigen, about unaligned accesses, despite the fact that I have observed everything in the documentation about alignment of Eigen types (https://eigen.tuxfamily.org/dox/group__DenseMatrixManipulation__Alignement.html).
I can only trigger this assertion when some Eigen code has run before, but was unsuccessful in pinpointing what the exact conditions are. For instance, this is the code that crashes:
Affine3f Transform::getAffine() const {
// ... Vector3f translation(...)
// ... Quaternionf rotation(...)
Affine3f affine = Affine3:f:Identity(); /// <---
affine.translate(translation);
affine.rotate(rotation);
return affine;
}
but only if some eigen code has been executed before. Maybe that is because the problem only arises after some allocations made by the Eigen::aligned_allocator.
However, the help pages tell me I should use a debugger to check exactly which object has unaligned:
For example, if you're using GCC, you can use the GDB debugger as follows:
$ gdb ./my_program # Start GDB on your program
> run # Start running your program
... # Now reproduce the crash!
> bt # Obtain the backtrace
Now that you know precisely where in your own code the problem is happening, read on to
understand what you need to change.
I am of course doing that, but the code crashing here seems to satisfy all the requirements.
Question
How can I effectively debug what code causes the misalignment when the error is only triggered during later allocations?

can the return value from finish in gdb be different from the actual one in execution

I am a gdb novice, and I was trying to debug some GSSAPI code, and was using fin to see the return value from the frame. As seen in the snip pasted below, the call from gssint_mechglue_initialize_library() seems to be 0 but the actual check seems to fail. Can someone please point out if I am missing something obvious here?
Thanks in advance!
One possible explanation for the observed behavior is that you are debugging optimized code, and that line 1001 isn't really executed.
You can confirm this with a few nexts, or by executing fin again and observing whether GSS_S_COMPLETE or something else is returned from gssint_select_mech_type.
When optimization is on, code motion performed by the optimizer often prevents correct assignment of actual code sequences to line numbers (as instructions "belonging" to different lines are mixed and re-ordered). This often causes the code to "jump around" when e.g. doing nexti command.
For ease of debugging, recompile with -O0, or make sure to remove any -O2 and the like from your compile lines.

How to recover the C++ try/throw/catch block length and address from machine code?

I'm doing a project that reorders basic blocks inside a function at runtime in C++ under 64-bit Linux. Of course, the reordering process includes updating instructions like "jmp", etc. One problem is that if (I guess) the compiler (clang++ or g++) determines the try{...} block using a range, i.e., from address1 to address2; the reordered code would have problems (some basic blocks are moved out of range and some new basic blocks are swapped in).
My question is: Does the compiler/program determines the try{...} block using a range? If so, or not, how can I know and modify the corresponding determinants, through which I can recover the try/throw/catch blocks and let the program execute normally after reordering; when the program has been already loaded into memory?
FYI, here is the relevant document for LLVM's implementation for try-catch. g++ does something very similar.
When you say by range, I would assume you are thinking the compiler would assume the code instruction from 0x0010 to 0x0020 is code, and instruction from 0x0020 to 0x0024 is for the catch block. From the LLVM specification, it doesn't rely on such assumption.
Edit:
here is some more reading for the implementation for how g++ and clang implements try-catch

How can I find out what's changing the return address of a function in c++

I have a program that behaves weirdly and probably has undefined behaviour. Sometimes, the return address of a function seems to be changed, and I don't know what's causing it.
The return address is always changed to the same address, an assertion inside a function the control shouldn't be able to reach. I've been able to stop the program with a debugger to see that when it's supposed to execute a return statement, it jumps straight to the line with the assertion instead.
This code approximates how my function works.
int foo(Vector t)
double sum = 0;
for(unsgined int i=0; i<t.size();++i){
sum += t[i];
}
double limit = bar(); // bar returns a value between 0 and 1
double a=0;
for(double i=0; i<10; i++){
a += f(i)/sum; // f(1)/sum + ... + f(10)/sum = 1.0f
if(a>3)return a;
}
//shoudn'get here
assert(false); // ... then this line is executed
}
This is what I've tried so far:
Switching all std::vector [] operators with .at to prevent accidentily writing into memory
Made sure all return-by-value values are const.
Switched on -Wall and -Werror and -pedantic-errors in gcc
Ran the program with valgrind
I get a couple of invalid read of size 8, but they seem to originate from qt, so I'm not sure what to make of it. Could this be the problem?
The error happens only occasionally when I have run the program for a while and give it certain input values, and more often in a release build than in a debug build.
EDIT:
So I managed to reproduce the problem in a console application (no qt loaded) I then manages to simulate events that caused the problem.
Like some of you suggested, it turns out I misjudged what was actually causing it to reach the assertion, probably due to my lack of experience with qt's debugger. The actual problem was a floating point error in the double i used as a loop condition.
I was implementing softmax, but exp(x) got rounded to zero with particular inputs.
Now, as I have solved the problem, I might rephrase it. Is there a method for checking problems like rounding errors automatically. I.e breaking on 0/0 for instance?
The short answer is:
The most portable way of determining if a floating-point exceptional condition has occurred is to use the floating-point exception facilities provided by C in fenv.h.
although, unfortunately, this is far from being perfect.
I suggest you to read both
https://www.securecoding.cert.org/confluence/display/seccode/FLP04-C.+Check+floating-point+inputs+for+exceptional+values
and
https://www.securecoding.cert.org/confluence/display/seccode/FLP03-C.+Detect+and+handle+floating-point+errors
which concisely address the exact question you are posing:
Is there a method for checking problems like rounding errors automatically.

Memory error: Dereference null pointer/ SSE misalignment

I'm compiling a program on remote linux server. The program compiled. However when I run it the program ends abruptly. So I debugged the program using DDT. It spits out the following error:
Process 0:
Memory error detected in ClassName::function (filename.cpp:6462).
Thread 1 attempted to dereference a null pointer or execute an SSE instruction with an
incorrectly aligned memory address (the latter may sometimes occur spuriously if guard
pages are enabled)
Tip: Use the stack list and the local variables to explore your program's current
state and identify the source of the error.
Can anyone please tell me what exactly this error means?
The line where the program stops looks like this:
SumUtility = ParaEst[0] + hhincome * ParaEst[71] + IsBlack * ParaEst[61] + IsBachAss * (ParaEst[55]);
This is within a switch case.
These are the variable types
vector<double> ParaEst;
double hhincome;
int IsBlack, Is BachAss;
Thanks for the help!
It means that:
ParaEst is NULL or a bad Pointer
ParaEst's individual array values are not aligned to 16-byte boundaries, required for SSE.
hhincome, IsBlack, or IsBachAss are not aligned to 16-byte boundaries and are SSE type values.
SumUtility is not aligned to 16-bytes and is a SSE type field.
If you could post the assembly code of the exact line that failed along with the register values of that assembler line, we could tell you exactly which of the above conditions have failed. It would also help to see the types of each variable shown to help narrow root the cause.
Ok... The problem finally got fixed.
The issue was that the expression where the code was breaking down was in a newly defined function. However for some weird reason running the make-file did not incorporate these changes and was still compiling using the previously compiled .o file. This resulted in garbage values being assigned to the variables within this new function. To top things off the program calls this function as a first step. Hence there was this systematic breakdown. The technical aspect of this was what Michael alluded to.
After this I would always recommend to use a make clean option in the make file. The issue of why running the make file is failing to compile the modified source file is an issue that definitely warrants further discussion.
Thanks for the responses!!