I am writing code which uses the Eigen matrix library for coordinate transforms, and also PCL for point cloud processing (which also uses Eigen a lot). I keep getting assertion errors from Eigen, about unaligned accesses, despite the fact that I have observed everything in the documentation about alignment of Eigen types (https://eigen.tuxfamily.org/dox/group__DenseMatrixManipulation__Alignement.html).
I can only trigger this assertion when some Eigen code has run before, but was unsuccessful in pinpointing what the exact conditions are. For instance, this is the code that crashes:
Affine3f Transform::getAffine() const {
// ... Vector3f translation(...)
// ... Quaternionf rotation(...)
Affine3f affine = Affine3:f:Identity(); /// <---
affine.translate(translation);
affine.rotate(rotation);
return affine;
}
but only if some eigen code has been executed before. Maybe that is because the problem only arises after some allocations made by the Eigen::aligned_allocator.
However, the help pages tell me I should use a debugger to check exactly which object has unaligned:
For example, if you're using GCC, you can use the GDB debugger as follows:
$ gdb ./my_program # Start GDB on your program
> run # Start running your program
... # Now reproduce the crash!
> bt # Obtain the backtrace
Now that you know precisely where in your own code the problem is happening, read on to
understand what you need to change.
I am of course doing that, but the code crashing here seems to satisfy all the requirements.
Question
How can I effectively debug what code causes the misalignment when the error is only triggered during later allocations?
I am writing a C++ application which has started to occasionally have what I believe to be bad allocations even though no bad alloc error is given. In one method, I have:
float * out = new float[len] //len works out to about 570000
memset(out, 0, sizeof(float) * len);
when this code runs, I get EXC_BAD_ACCESS at address out[4] (always the 4th element). In LLDB I tested the range of the error; turns out, I can never access/write to out[n] for any 4 <= n < 1028.
I assume that there is some sort of allocation problem preventing me from getting clean access to this memory block, but I can't figure out how to find the responsible code. Any ideas where I can start?
I can provide more details if necessary. Thanks!
It works when, in the loop, I set every element to 0 or to entry_count-1.
It works when I set it up so that entry_count is small, and I write it by hand instead of by loop (sorted_order[0] = 0; sorted_order[1] = 1; ... etc).
Please do not tell me what to do to fix my code. I will not be using smart pointers or vectors for very specific reasons. Instead focus on the question:
What sort of conditions can cause this segfault?
Thank you.
---- OLD -----
I am trying to debug code that isn't working on a unix machine. The gist of the code is:
int *sorted_array = (int*)memory;
// I know that this block is large enough
// It is allocated by malloc earlier
for (int i = 0; i < entry_count; ++i){
sorted_array[i] = i;
}
There appears to be a segfault somewhere in the loop. Switching to debug mode, unfortunately, makes the segfault stop. Using cout debugging I found that it must be in the loop.
Next I wanted to know how far into the loop the segfault happend so I added:
std::cout << i << '\n';
It showed the entire range it was suppose to be looping over and there was no segfault.
With a little more experimentation I eventually created a string stream before the loop and write an empty string into it for each iteration of the loop and there is no segfault.
I tried some other assorted operations trying to figure out what is going on. I tried setting a variable j = i; and stuff like that, but I haven't found anything that works.
Running valgrind the only information I got on the segfault was that it was a "General Protection Fault" and something about default response to 11. It also mentions that there's a Conditional jump or move depends on uninitialized value(s), but looking at the code I can't figure out how that's possible.
What can this be? I am out of ideas to explore.
This is clearly a symptoms of invalid memory uses within your program.This would be bit difficult to find by looking out your code snippet as it is most likely be the side effect of something else bad which has already happened.
However as you have mentioned in your question that you are able to attach your program using Valgrind. as it is reproducible. So you may want to attach your program(a.out).
$ valgrind --tool=memcheck --db-attach=yes ./a.out
This way Valgrind would attach your program in the debugger when your first memory error is detected so that you can do live debugging(GDB). This should be the best possible way to understand and resolve your problem.
Once you are able to figure it out your first error, fix it and rerun it and see what are other errors you are getting.This steps should be done till no error is getting reported by Valgrind.
However you should avoid using the raw pointers in modern C++ programs and start using std::vector std::unique_ptr as suggested by others as well.
Valgrind and GDB are very useful.
The most previous one that I used was GDB- I like it because it showed me the exact line number that the Segmentation Fault was on.
Here are some resources that can guide you on using GDB:
GDB Tutorial 1
GDB Tutorial 2
If you still cannot figure out how to use GDB with these tutorials, there are tons on Google! Just search debugging Segmentation Faults with GDB!
Good luck :)
That is hard, I used valgrind tools to debug seg-faults and it usually pointed to violations.
Likely your problem is freed memory that you are writing to i.e. sorted_array gets out of scope or gets freed.
Adding more code hides this problem as data allocation shifts around.
After a few days of experimentation, I figured out what was really going on.
For some reason the machine segfaults on unaligned access. That is, the integers I was writing were not being written to memory boundaries that were multiples of four bytes. Before the loop I computed the offset and shifted the array up that much:
int offset = (4 - (uintptr_t)(memory) % 4) % 4;
memory += offset;
After doing this everything behaved as expected again.
I have a program that behaves weirdly and probably has undefined behaviour. Sometimes, the return address of a function seems to be changed, and I don't know what's causing it.
The return address is always changed to the same address, an assertion inside a function the control shouldn't be able to reach. I've been able to stop the program with a debugger to see that when it's supposed to execute a return statement, it jumps straight to the line with the assertion instead.
This code approximates how my function works.
int foo(Vector t)
double sum = 0;
for(unsgined int i=0; i<t.size();++i){
sum += t[i];
}
double limit = bar(); // bar returns a value between 0 and 1
double a=0;
for(double i=0; i<10; i++){
a += f(i)/sum; // f(1)/sum + ... + f(10)/sum = 1.0f
if(a>3)return a;
}
//shoudn'get here
assert(false); // ... then this line is executed
}
This is what I've tried so far:
Switching all std::vector [] operators with .at to prevent accidentily writing into memory
Made sure all return-by-value values are const.
Switched on -Wall and -Werror and -pedantic-errors in gcc
Ran the program with valgrind
I get a couple of invalid read of size 8, but they seem to originate from qt, so I'm not sure what to make of it. Could this be the problem?
The error happens only occasionally when I have run the program for a while and give it certain input values, and more often in a release build than in a debug build.
EDIT:
So I managed to reproduce the problem in a console application (no qt loaded) I then manages to simulate events that caused the problem.
Like some of you suggested, it turns out I misjudged what was actually causing it to reach the assertion, probably due to my lack of experience with qt's debugger. The actual problem was a floating point error in the double i used as a loop condition.
I was implementing softmax, but exp(x) got rounded to zero with particular inputs.
Now, as I have solved the problem, I might rephrase it. Is there a method for checking problems like rounding errors automatically. I.e breaking on 0/0 for instance?
The short answer is:
The most portable way of determining if a floating-point exceptional condition has occurred is to use the floating-point exception facilities provided by C in fenv.h.
although, unfortunately, this is far from being perfect.
I suggest you to read both
https://www.securecoding.cert.org/confluence/display/seccode/FLP04-C.+Check+floating-point+inputs+for+exceptional+values
and
https://www.securecoding.cert.org/confluence/display/seccode/FLP03-C.+Detect+and+handle+floating-point+errors
which concisely address the exact question you are posing:
Is there a method for checking problems like rounding errors automatically.
This issue is important especially for embedded development. Exception handling adds some footprint to generated binary output. On the other hand, without exceptions the errors need to be handled some other way, which requires additional code, which eventually also increases binary size.
I'm interested in your experiences, especially:
What is average footprint added by your compiler for the exception handling (if you have such measurements)?
Is the exception handling really more expensive (many say that), in terms of binary output size, than other error handling strategies?
What error handling strategy would you suggest for embedded development?
Please take my questions only as guidance. Any input is welcome.
Addendum: Does any one have a concrete method/script/tool that, for a specific C++ object/executable, will show the percentage of the loaded memory footprint that is occupied by compiler-generated code and data structures dedicated to exception handling?
When an exception occurs there will be time overhead which depends on how you implement your exception handling. But, being anecdotal, the severity of an event that should cause an exception will take just as much time to handle using any other method. Why not use the highly supported language based method of dealing with such problems?
The GNU C++ compiler uses the zero–cost model by default i.e. there is no time overhead when exceptions don't occur.
Since information about exception-handling code and the offsets of local objects can be computed once at compile time, such information can be kept in a single place associated with each function, but not in each ARI. You essentially remove exception overhead from each ARI and thus avoid the extra time to push them onto the stack. This approach is called the zero-cost model of exception handling, and the optimized storage mentioned earlier is known as the shadow stack. - Bruce Eckel, Thinking in C++ Volume 2
The size complexity overhead isn't easily quantifiable but Eckel states an average of 5 and 15 percent. This will depend on the size of your exception handling code in ratio to the size of your application code. If your program is small then exceptions will be a large part of the binary. If you are using a zero–cost model than exceptions will take more space to remove the time overhead, so if you care about space and not time than don't use zero-cost compilation.
My opinion is that most embedded systems have plenty of memory to the extent that if your system has a C++ compiler you have enough space to include exceptions. The PC/104 computer that my project uses has several GB of secondary memory, 512 MB of main memory, hence no space problem for exceptions - though, our micorcontrollers are programmed in C. My heuristic is "if there is a mainstream C++ compiler for it, use exceptions, otherwise use C".
Measuring things, part 2. I have now got two programs. The first is in C and is compiled with gcc -O2:
#include <stdio.h>
#include <time.h>
#define BIG 1000000
int f( int n ) {
int r = 0, i = 0;
for ( i = 0; i < 1000; i++ ) {
r += i;
if ( n == BIG - 1 ) {
return -1;
}
}
return r;
}
int main() {
clock_t start = clock();
int i = 0, z = 0;
for ( i = 0; i < BIG; i++ ) {
if ( (z = f(i)) == -1 ) {
break;
}
}
double t = (double)(clock() - start) / CLOCKS_PER_SEC;
printf( "%f\n", t );
printf( "%d\n", z );
}
The second is C++, with exception handling, compiled with g++ -O2:
#include <stdio.h>
#include <time.h>
#define BIG 1000000
int f( int n ) {
int r = 0, i = 0;
for ( i = 0; i < 1000; i++ ) {
r += i;
if ( n == BIG - 1 ) {
throw -1;
}
}
return r;
}
int main() {
clock_t start = clock();
int i = 0, z = 0;
for ( i = 0; i < BIG; i++ ) {
try {
z += f(i);
}
catch( ... ) {
break;
}
}
double t = (double)(clock() - start) / CLOCKS_PER_SEC;
printf( "%f\n", t );
printf( "%d\n", z );
}
I think these answer all the criticisms made of my last post.
Result: Execution times give the C version a 0.5% edge over the C++ version with exceptions, not the 10% that others have talked about (but not demonstrated)
I'd be very grateful if others could try compiling and running the code (should only take a few minutes) in order to check that I have not made a horrible and obvious mistake anywhere. This is knownas "the scientific method"!
I work in a low latency environment. (sub 300 microseconds for my application in the "chain" of production) Exception handling, in my experience, adds 5-25% execution time depending on the amount you do!
We don't generally care about binary bloat, but if you get too much bloat then you thrash like crazy, so you need to be careful.
Just keep the binary reasonable (depends on your setup).
I do pretty extensive profiling of my systems.
Other nasty areas:
Logging
Persisting (we just don't do this one, or if we do it's in parallel)
I guess it'd depend on the hardware and toolchain port for that specific platform.
I don't have the figures. However, for most embedded developement, I have seen people chucking out two things (for VxWorks/GCC toolchain):
Templates
RTTI
Exception handling does make use of both in most cases, so there is a tendency to throw it out as well.
In those cases where we really want to get close to the metal, setjmp/longjmp are used. Note, that this isn't the best solution possible (or very powerful) probably, but then that's what _we_ use.
You can run simple tests on your desktop with two versions of a benchmarking suite with/without exception handling and get the data that you can rely on most.
Another thing about embedded development: templates are avoided like the plague -- they cause too much bloat. Exceptions tag along templates and RTTI as explained by Johann Gerell in the comments (I assumed this was well understood).
Again, this is just what we do. What is it with all the downvoting?
One thing to consider: If you're working in an embedded environment, you want to get the application as small as possible. The Microsoft C Runtime adds quite a bit of overhead to programs. By removing the C runtime as a requirement, I was able to get a simple program to be a 2KB exe file instead of a 70-something kilobyte file, and that's with all the optimizations for size turned on.
C++ exception handling requires compiler support, which is provided by the C runtime. The specifics are shrouded in mystery and are not documented at all. By avoiding C++ exceptions I could cut out the entire C runtime library.
You might argue to just dynamically link, but in my case that wasn't practical.
Another concern is that C++ exceptions need limited RTTI (runtime type information) at least on MSVC, which means that the type names of your exceptions are stored in the executable. Space-wise, it's not an issue, but it just 'feels' cleaner to me to not have this information in the file.
It's easy to see the impact on binary size, just turn off RTTI and exceptions in your compiler. You'll get complaints about dynamic_cast<>, if you're using it... but we generally avoid using code that depends on dynamic_cast<> in our environments.
We've always found it to be a win to turn off exception handling and RTTI in terms of binary size. I've seen many different error handling methods in the absence of exception handling. The most popular seems to be passing failure codes up the callstack. In our current project we use setjmp/longjmp but I'd advise against this in a C++ project as they won't run destructors when exiting a scope in many implementations. If I'm honest I think this was a poor choice made by the original architects of the code, especially considering that our project is C++.
In my opinion exception handling is not something that's generally acceptable for embedded development.
Neither GCC nor Microsoft have "zero-overhead" exception handling. Both compilers insert prologue and epilogue statements into each function that track the scope of execution. This leads to a measurable increase in performance and memory footprint.
The performance difference is something like 10% in my experience, which for my area of work (realtime graphics) is a huge amount. The memory overhead was far less but still significant - I can't remember the figure off-hand but with GCC/MSVC it's easy to compile your program both ways and measure the difference.
I've seen some people talk about exception handling as an "only if you use it" cost. Based on what I've observed this just isn't true. When you enable exception handling it affects all code, whether a code path can throw exceptions or not (which makes total sense when you consider how a compiler works).
I would also stay away from RTTI for embedded development, although we do use it in debug builds to sanity check downcasting results.
Define 'embedded'. On an 8-bit processor I would not certainly not work with exceptions (I would certainly not work with C++ on an 8-bit processor). If you're working with a PC104 type board that is powerful enough to have been someone's desktop a few years back then you might get away with it. But I have to ask - why are there exceptions? Usually in embedded applications anything like an exception occurring is unthinkable - why didn't that problem get sorted out in testing?
For instance, is this in a medical device? Sloppy software in medical devices has killed people. It is unacceptable for anything unplanned to occur, period. All failure modes must be accounted for and, as Joel Spolsky said, exceptions are like GOTO statements except you don't know where they're called from. So when you handle your exception, what failed and what state is your device in? Due to your exception is your radiation therapy machine stuck at FULL and is cooking someone alive (this has happened IRL)? At just what point did the exception happen in your 10,000+ lines of code. Sure you may be able to cut that down to perhaps 100 lines of code but do you know the significance of each of those lines causing an exception is?
Without more information I would say do NOT plan for exceptions in your embedded system. If you add them then be prepared to plan the failure modes of EVERY LINE OF CODE that could cause an exception. If you're making a medical device then people die if you don't. If you're making a portable DVD player, well, you've made a bad portable DVD player. Which is it?