C++ code cross compiling to ARM, runtime crash

C++ code cross compiling to ARM, runtime crash - c++

I'm fixing a bug about android multimedia framework lower c++ lib. When the code running to following position system goto crash.
if (((*pChar) >= _T('a')) && ((*pChar) <= _T('z'))) {
nFrameTime++;
}
nFrameTime is int type;
pChar is wchar_t* type;
But when i modify the code to:
if (((*pChar) >= _T('a')) || ((*pChar) <= _T('z'))) {
nFrameTime++;
}
Everything is OK. I do not care about using "&&" or "||", I only want to know why that go to crash. Anybody can give me some suggestions?

Most likely pChar isn't pointing at valid data. That's the only thing in there that could really cause a crash (compiler bugs excepted).
The real mystery is why the changed version isn't crashing.
As to the answer to my question, it could be that when you change the code it modifies things just enough that the garbage in pChar happens to point to a valid memory location. Another possibility, as Ben Voigt pointed out in the comments, is that the check is being optimized away in the second version, because any value at all of *pChar will cause it to be true.

Related

Why don't define some undefined behaviours?

What are the reasons for C++ to not define some behavior (something like better error checking)? Why don't throw some error and stop?
Some pseudocodes for example:
if (p == NULL && op == deref){
return "Invalid operation"
}
For Integer Overflows:
if(size > capacity){
return "Overflow"
}
I know these are very simple examples. But I'm pretty sure most UBs can be caught by the compiler. So why not implement them? Because it is really time expensive and not doing error checking is faster? Some UBs can be caught with a single if statement. So maybe speed is not the only concern?

Because the compiler would have to add these instructions every time you use a pointer. A C++ program uses a lot of pointers. So there would be a lot of these instructions for the computer to run. The C++ philosophy is that you should not pay for features you don't need. If you want a null pointer check, you can write a null pointer check. It's quite easy to make a custom pointer class that checks if it is null.
On most operating systems, dereferencing a null pointer is guaranteed to crash the program, anyway. And overflowing an integer is guaranteed to wrap around. C++ doesn't only run on these operating systems - it also runs on other operating systems where it doesn't. So the behaviour is not defined in C++ but it is defined in the operating system.
Nonetheless, some compiler writers realized that by un-defining it again, they can make programs faster. A surprising amount of optimizations are possible because of this. Sometimes there's a command-line option which tells the compiler not to un-define the behaviour, like -fwrapv.
A common optimization is to simply assume the program never gets to the part with the UB. Just yesterday I saw a question where someone had written an obviously not infinite loop:
int array[N];
// ...
for(int i = 0; i < N+1; i++) {
fprintf(file, "%d ", array[i]);
}
but it was infinite. Why? The person asking the question had turned on optimization. The compiler can see that there is UB on the last iteration, since array[N] is out of bounds. So it assumed the program magically stopped before the last iteration, so it didn't need to check for the last iteration and it could delete the part i < N+1.
Most of the time this makes sense because of macros or inline functions. Someone writes code like:
int getFoo(struct X *px) {return (px == NULL ? -1 : px->foo);}
int blah(struct X *px) {
bar(px->f1);
printf("%s", px->name);
frobnicate(&px->theFrob);
count += getFoo(px);
}
and the compiler can make the quite reasonable assumption that px isn't null, so it deletes the px == NULL check and treats getFoo(px) the same as px->foo. That's often why compiler writers choose to keep it undefined even in cases where it could be easily defined.

Compilers already have switches to enable those checks (those are called sanitizers, e.g. -fsanitize=address, -fsanitize=undefined in GCC and Clang).
It doesn't make sense for the standard to always require those checks, because they harm performance, so you might not want them in release builds.

What do the memory operations malloc and free exactly do?

Recently I met a memory release problem. First, the blow is the C codes:
#include <stdio.h>
#include <stdlib.h>
int main ()
{
int *p =(int*) malloc(5*sizeof (int));
int i ;
for(i =0;i<5; i++)
p[i ]=i;
p[i ]=i;
for(i =0;i<6; i++)
printf("[%p]:%d\n" ,p+ i,p [i]);
free(p );
printf("The memory has been released.\n" );
}
Apparently, there is the memory out of range problem. And when I use the VS2008 compiler, it give the following output and some errors about memory release:
[00453E80]:0
[00453E84]:1
[00453E88]:2
[00453E8C]:3
[00453E90]:4
[00453E94]:5
However when I use the gcc 4.7.3 compiler of cygwin, I get the following output:
[0x80028258]:0
[0x8002825c]:1
[0x80028260]:2
[0x80028264]:3
[0x80028268]:4
[0x8002826c]:51
The memory has been released.
Apparently, the codes run normally, but 5 is not written to the memory.
So there are maybe some differences between VS2008 and gcc on handling these problems.
Could you guys give me some professional explanation on this? Thanks In Advance.

This is normal as you have never allocated any data into the mem space of p[5]. The program will just print what ever data was stored in that space.

There's no deterministic "explanation on this". Writing data into the uncharted territory past the allocated memory limit causes undefined behavior. The behavior is unpredictable. That's all there is to it.
It is still strange though to see that 51 printed there. Typically GCC will also print 5 but fail with memory corruption message at free. How you managed to make this code print 51 is not exactly clear. I strongly suspect that the code you posted is not he code you ran.

It seems that you have multiple questions, so, let me try to answer them separately:
As pointed out by others above, you write past the end of the array so, once you have done that, you are in "undefined behavior" territory and this means that anything could happen, including printing 5, 6 or 0xdeadbeaf, or blow up your PC.
In the first case (VS2008), free appears to report an error message on standard output. It is not obvious to me what this error message is so it is hard to explain what is going on but you ask later in a comment how VS2008 could know the size of the memory you release. Typically, if you allocate memory and store it in pointer p, a lot of memory allocators (the malloc/free implementation) store at p[-1] the size of the memory allocated. In practice, it is common to also store at address p[p[-1]] a special value (say, 0xdeadbeaf). This "canary" is checked upon free to see if you have written past the end of the array. To summarize, your 5*sizeof(int) array is probably at least 5*sizeof(int) + 2*sizeof(char*) bytes long and the memory allocator used by code compiled with VS2008 has quite a few checks builtin.
In the case of gcc, I find it surprising that you get 51 printed. If you wanted to investigate wwhy that is exactly, I would recommend getting an asm dump of the generated code as well as running this under a debugger to check if 5 is actually really written past the end of the array (gcc could well have decided not to generate that code because it is "undefined") and if it is, to put a watchpoint on that memory location to see who overrides it, when, and why.

Performance tuning

I have the following code segment . For a vector of size 176000 the loop takes upto 8 minutes to execute . I am not sure what is taking so much time
XEPComBSTR bstrSetWithIdsAsString; //Wrapper class for BSTR
std::vector<__int64>::const_iterator it;
for(it = vecIds.begin();
it != vecIds.end();
it++)
{
__int64 i64Id = (*it);
__int64 i64OID = XPtFunctions::GetOID(i64Id);
// set ',' between two set members
if (it != vecIds.begin())
bstrSetWithIdsAsString.Append(XEPComBSTR(L","));
wchar_t buf[20];
_i64tow_s(i64OID, buf, 20, 10);
bstrSetWithIdsAsString.Append(buf);
}
__int64 GetOID( const __int64 &i64Id)
{
__int64 numId = i64Id;
numId <<= 16;
numId >>= 16;
return numId;
}

I think your bottleneck is the Append function. You see, the string has some allocated memory inside, and when you try to append something that won't fit, it reallocates more memory, which takes a lot of time. Try allocating necessary memory once in the beginning. HTH

I am not sure what this is doing:
bstrSetWithIdsAsString.Append(buf);
but I guess this is where the slowness is, particularly if it has to work out where the end of the buffer is every time by looking for the first zero byte, and possibly needs to do a lot of reallocating.
Why not use wostringstream?

The only way to find out what takes up all this time is to profile the application. Some editions of Visual Studio come with a fully functional profiler.
Alternatively, simply run the program in the debugger, and break it at random intervals, and note where in the code you are.
But I can see a few potential trouble spots:
you perform a lot of string appends. Do they allocate new memory each time? Does your string type allow you to reserve memory in advance, like std::string can do? In general, is the string class efficient?
you loop over iterators, and given the horrible Hungarian Notation you appear to use, I am assuming you are working on Windows, probably using MSVC. Some versions of MSVC enable a lot of runtime checking of STL iterators even in release builds, unless you explicitly disable it. VS2005 and 2008 specifically are guilty of this. 2010 only enables this checking in debug mode.
and, of course, you are building with optimizations enabled, right?
But I'm just pointing out what seems like it could potentially slow down your code. I have no clue what's actually happening. To be sure, I'd have to profile your code. You can do that. I can't. So do it.

GCC: program doesn't work with compilation option -O3

I'm writing a C++ program that doesn't work (I get a segmentation fault) when I compile it with optimizations (options -O1, -O2, -O3, etc.), but it works just fine when I compile it without optimizations.
Is there any chance that the error is in my code? or should I assume that this is a bug in GCC?
My GCC version is 3.4.6.
Is there any known workaround for this kind of problem?
There is a big difference in speed between the optimized and unoptimized version of my program, so I really need to use optimizations.
This is my original functor. The one that works fine with no levels of optimizations and throws a segmentation fault with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
return distance(point,p1) < distance(point,p2) ;
}
} ;
And this one works flawlessly with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
float d1=distance(point,p1) ;
float d2=distance(point,p2) ;
std::cout << "" ; //without this line, I get a segmentation fault anyways
return d1 < d2 ;
}
} ;
Unfortunately, this problem is hard to reproduce because it happens with some specific values. I get the segmentation fault upon sorting just one out of more than a thousand vectors, so it really depends on the specific combination of values each vector has.

Now that you posted the code fragment and a working workaround was found (#Windows programmer's answer), I can say that perhaps what you are looking for is -ffloat-store.
-ffloat-store
Do not store floating point variables in registers, and inhibit other options that might change whether a floating point value is taken from a register or memory.
This option prevents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have. Similarly for the x86 architecture. For most programs, the excess precision does only good, but a few programs rely on the precise definition of IEEE floating point. Use -ffloat-store for such programs, after modifying them to store all pertinent intermediate computations into variables.
Source: http://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Optimize-Options.html

I would assume your code is wrong first.
Though it is hard to tell.
Does your code compile with 0 warnings?
g++ -Wall -Wextra -pedantic -ansi

Here's some code that seems to work, until you hit -O3...
#include <stdio.h>
int main()
{
int i = 0, j = 1, k = 2;
printf("%d %d %d\n", *(&j-1), *(&j), *(&j+1));
return 0;
}
Without optimisations, I get "2 1 0"; with optimisations I get "40 1 2293680". Why? Because i and k got optimised out!
But I was taking the address of j and going out of the memory region allocated to j. That's not allowed by the standard. It's most likely that your problem is caused by a similar deviation from the standard.
I find valgrind is often helpful at times like these.
EDIT: Some commenters are under the impression that the standard allows arbitrary pointer arithmetic. It does not. Remember that some architectures have funny addressing schemes, alignment may be important, and you may get problems if you overflow certain registers!
The words of the [draft] standard, on adding/subtracting an integer to/from a pointer (emphasis added):
"If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."
Seeing as &j doesn't even point to an array object, &j-1 and &j+1 can hardly point to part of the same array object. So simply evaluating &j+1 (let alone dereferencing it) is undefined behaviour.
On x86 we can be pretty confident that adding one to a pointer is fairly safe and just takes us to the next memory location. In the code above, the problem occurs when we make assumptions about what that memory contains, which of course the standard doesn't go near.

As an experiment, try to see if this will force the compiler to round everything consistently.
volatile float d1=distance(point,p1) ;
volatile float d2=distance(point,p2) ;
return d1 < d2 ;

The error is in your code. It's likely you're doing something that invokes undefined behavior according to the C standard which just happens to work with no optimizations, but when GCC makes certain assumptions for performing its optimizations, the code breaks when those assumptions aren't true. Make sure to compile with the -Wall option, and the -Wextra might also be a good idea, and see if you get any warnings. You could also try -ansi or -pedantic, but those are likely to result in false positives.

You may be running into an aliasing problem (or it could be a million other things). Look up the -fstrict-aliasing option.
This kind of question is impossible to answer properly without more information.

It is very seldom the compiler fault, but compiler do have bugs in them, and them often manifest themselves at different optimization levels (if there is a bug in an optimization pass, for example).
In general when reporting programming problems: provide a minimal code sample to demonstrate the issue, such that people can just save the code to a file, compile and run it. Make it as easy as possible to reproduce your problem.
Also, try different versions of GCC (compiling your own GCC is very easy, especially on Linux). If possible, try with another compiler. Intel C has a compiler which is more or less GCC compatible (and free for non-commercial use, I think). This will help pinpointing the problem.

It's almost (almost) never the compiler.
First, make sure you're compiling warning-free, with -Wall.
If that didn't give you a "eureka" moment, attach a debugger to the least optimized version of your executable that crashes and see what it's doing and where it goes.
5 will get you 10 that you've fixed the problem by this point.

Ran into the same problem a few days ago, in my case it was aliasing. And GCC does it differently, but not wrongly, when compared to other compilers. GCC has become what some might call a rules-lawyer of the C++ standard, and their implementation is correct, but you also have to be really correct in you C++, or it'll over optimize somethings, which is a pain. But you get speed, so can't complain.

I expect to get some downvotes here after reading some of the comments, but in the console game programming world, it's rather common knowledge that the higher optimization levels can sometimes generate incorrect code in weird edge cases. It might very well be that edge cases can be fixed with subtle changes to the code, though.

Alright...
This is one of the weirdest problems I've ever had.
I dont think I have enough proof to state it's a GCC bug, but honestly... It really looks like one.
This is my original functor. The one that works fine with no levels of optimizations and throws a segmentation fault with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
return distance(point,p1) < distance(point,p2) ;
}
} ;
And this one works flawlessly with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
float d1=distance(point,p1) ;
float d2=distance(point,p2) ;
std::cout << "" ; //without this line, I get a segmentation fault anyways
return d1 < d2 ;
}
} ;
Unfortunately, this problem is hard to reproduce because it happens with some specific values. I get the segmentation fault upon sorting just one out of more than a thousand vectors, so it really depends on the specific combination of values each vector has.

Wow, I didn't expect answers so quicly, and so many...
The error occurs upon sorting a std::vector of pointers using std::sort()
I provide the strict-weak-ordering functor.
But I know the functor I provide is correct because I've used it a lot and it works fine.
Plus, the error cannot be some invalid pointer in the vector becasue the error occurs just when I sort the vector. If I iterate through the vector without applying std::sort first, the program works fine.
I just used GDB to try to find out what's going on. The error occurs when std::sort invoke my functor. Aparently std::sort is passing an invalid pointer to my functor. (of course this happens with the optimized version only, any level of optimization -O, -O2, -O3)

as other have pointed out, probably strict aliasing.
turn it of in o3 and try again. My guess is that you are doing some pointer tricks in your functor (fast float as int compare? object type in lower 2 bits?) that fail across inlining template functions.
warnings do not help to catch this case. "if the compiler could detect all strict aliasing problems it could just as well avoid them" just changing an unrelated line of code may make the problem appear or go away as it changes register allocation.

As the updated question will show ;) , the problem exists with a std::vector<T*>. One common error with vectors is reserve()ing what should have been resize()d. As a result, you'd be writing outside array bounds. An optimizer may discard those writes.

post the code in distance! it probably does some pointer magic, see my previous post. doing an intermediate assignment just hides the bug in your code by changing register allocation. even more telling of this is the output changing things!

The true answer is hidden somewhere inside all the comments in this thread. First of all: it is not a bug in the compiler.
The problem has to do with floating point precision. distanceToPointSort should be a function that should never return true for both the arguments (a,b) and (b,a), but that is exactly what can happen when the compiler decides to use higher precision for some data paths. The problem is especially likely on, but by no means limited to, x86 without -mfpmath=sse. If the comparator behaves that way, the sort function can become confused, and the segmentation fault is not surprising.
I consider -ffloat-store the best solution here (already suggested by CesarB).

What is the performance implication of converting to bool in C++?

[This question is related to but not the same as this one.]
My compiler warns about implicitly converting or casting certain types to bool whereas explicit conversions do not produce a warning:
long t = 0;
bool b = false;
b = t; // performance warning: forcing long to bool
b = (bool)t; // performance warning
b = bool(t); // performance warning
b = static_cast<bool>(t); // performance warning
b = t ? true : false; // ok, no warning
b = t != 0; // ok
b = !!t; // ok
This is with Visual C++ 2008 but I suspect other compilers may have similar warnings.
So my question is: what is the performance implication of casting/converting to bool? Does explicit conversion have better performance in some circumstance (e.g., for certain target architectures or processors)? Does implicit conversion somehow confuse the optimizer?
Microsoft's explanation of their warning is not particularly helpful. They imply that there is a good reason but they don't explain it.

I was puzzled by this behaviour, until I found this link:
http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=99633
Apparently, coming from the Microsoft Developer who "owns" this warning:
This warning is surprisingly
helpful, and found a bug in my code
just yesterday. I think Martin is
taking "performance warning" out of
context.
It's not about the generated code,
it's about whether or not the
programmer has signalled an intent to
change a value from int to bool.
There is a penalty for that, and the
user has the choice to use "int"
instead of "bool" consistently (or
more likely vice versa) to avoid the
"boolifying" codegen. [...]
It is an old warning, and may have
outlived its purpose, but it's
behaving as designed here.
So it seems to me the warning is more about style and avoiding some mistakes than anything else.
Hope this will answer your question...
:-p

The performance is identical across the board. It involves a couple of instructions on x86, maybe 3 on some other architectures.
On x86 / VC++, they all do
cmp DWORD PTR [whatever], 0
setne al
GCC generates the same thing, but without the warnings (at any warning-level).

The performance warning does actually make a little bit of sense. I've had it as well and my curiousity led me to investigate with the disassembler. It is trying to tell you that the compiler has to generate some code to coerce the value to either 0 or 1. Because you are insisting on a bool, the old school C idea of 0 or anything else doesn't apply.
You can avoid that tiny performance hit if you really want to. The best way is to avoid the cast altogether and use a bool from the start. If you must have an int, you could just use if( int ) instead of if( bool ). The code generated will simply check whether the int is 0 or not. No extra code to make sure the value is 1 if it's not 0 will be generated.

Sounds like premature optimization to me. Are you expecting that the performance of the cast to seriously effect the performance of your app? Maybe if you are writing kernel code or device drivers but in most cases, they all should be ok.

As far as I know, there is no warning on any other compiler for this. The only way I can think that this would cause a performance loss is that the compiler has to compare the entire integer to 0 and then assign the bool appropriately (unlike a conversion such as a char to bool, where the result can be copied over because a bool is one byte and so they are effectively the same), or an integral conversion which involves copying some or all of the source to the destination, possibly after a zero of the destination if it's bigger than the source (in terms of memory).
It's yet another one of Microsoft's useless and unhelpful ideas as to what constitutes good code, and leads us to have to put up with stupid definitions like this:
template <typename T>
inline bool to_bool (const T& t)
{ return t ? true : false; }

long t;
bool b;
int i;
signed char c;
...
You get a warning when you do anything that would be "free" if bool wasn't required to be 0 or 1. b = !!t is effectively assigning the result of the (language built-in, non-overrideable) bool operator!(long)
You shouldn't expect the ! or != operators to cost zero asm instructions even with an optimizing compiler. It is usually true that int i = t is usually optimized away completely. Or even signed char c = t; (on x86/amd64, if t is in the %eax register, after c = t, using c just means using %al. amd64 has byte addressing for every register, BTW. IIRC, in x86 some registers don't have byte addressing.)
Anyway, b = t; i = b; isn't the same as c = t; i = c; it's i = !!t; instead of i = t & 0xff;
Err, I guess everyone already knows all that from the previous replies. My point was, the warning made sense to me, since it caught cases where the compiler had to do things you didn't really tell it to, like !!BOOL on return because you declared the function bool, but are returning an integral value that could be true and != 1. e.g. a lot of windows stuff returns BOOL (int).
This is one of MSVC's few warnings that G++ doesn't have. I'm a lot more used to g++, and it definitely warns about stuff MSVC doesn't, but that I'm glad it told me about. I wrote a portab.h header file with stubs for the MFC/Win32 classes/macros/functions I used. This got the MFC app I'm working on to compile on my GNU/Linux machine at home (and with cygwin). I mainly wanted to be able to compile-test what I was working on at home, but I ended up finding g++'s warnings very useful. It's also a lot stricter about e.g. templates...
On bool in general, I'm not sure it makes for better code when used as a return values and parameter passing. Even for locals, g++ 4.3 doesn't seem to figure out that it doesn't have to coerce the value to 0 or 1 before branching on it. If it's a local variable and you never take its address, the compiler should keep it in whatever size is fastest. If it has to spill it from registers to the stack, it could just as well keep it in 4 bytes, since that may be slightly faster. (It uses a lot of movsx (sign-extension) instructions when loading/storing (non-local) bools, but I don't really remember what it did for automatic (local stack) variables. I do remember seeing it reserve an odd amount of stack space (not a multiple of 4) in functions that had some bools locals.)
Using bool flags was slower than int with the Digital Mars D compiler as of last year:
http://www.digitalmars.com/d/archives/digitalmars/D/opEquals_needs_to_return_bool_71813.html
(D is a lot like C++, but abandons full C backwards compat to define some nice new semantics, and good support for template metaprogramming. e.g. "static if" or "static assert" instead of template hacks or cpp macros. I'd really like to give D a try sometime. :)
For data structures, it can make sense, e.g. if you want to pack a couple flags before an int and then some doubles in a struct you're going to have quite a lot of.

Based on your link to MS' explanation, it appears that if the value is merely 1 or 0, there is not performance hit, but if it's any other non-0 value that a comparison must be built at compile time?

In C++ a bool ISA int with only two values 0 = false, 1 = true. The compiler only has to check one bit. To be perfectly clear, true != 0, so any int can override bool, it just cost processing cycles to do so.
By using a long as in the code sample, you are forcing a lot more bit checks, which will cause a performance hit.
No this is not premature optimization, it is quite crazy to use code that takes more processing time across the board. This is simply good coding practice.

Unless you're writing code for a really critical inner loop (simulator core, ray-tracer, etc.) there is no point in worrying about any performance hits in this case. There are other more important things to worry about in your code (and other more significant performance traps lurking, I'm sure).

Microsoft's explanation seems to be that what they're trying to say is:
Hey, if you're using an int, but are
only storing true or false information in
it, make it a bool!
I'm skeptical about how much would be gained performance-wise, but MS may have found that there was some gain (for their use, anyway). Microsoft's code does tend to run an awful lot, so maybe they've found the micro-optimization to be worthwhile. I believe that a fair bit of what goes into the MS compiler is to support stuff they find useful themselves (only makes sense, right?).
And you get rid of some dirty, little casts to boot.

I don't think performance is the issue here. The reason you get a warning is that information is lost during conversion from int to bool.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ code cross compiling to ARM, runtime crash - c++

Related

Why don't define some undefined behaviours?

What do the memory operations malloc and free exactly do?

Performance tuning

GCC: program doesn't work with compilation option -O3

What is the performance implication of converting to bool in C++?

Categories

Resources