Related
I will state my problem in a very simplified form, which is:
If I type in C
void main(){
int a=3+2;
double b=7/2;
}
When will a and b, be assigned their values of 5 and 3.5 is it when I compile my code or is it when I run the code?
In other words, What will happen when I press compile? and how it is different from the case when I press run, in terms of assigning the values and doing the computations and how is that different from writing my code as:
void main(){
int a=5;
double b=3.5;
}
I am asking this because I have heard about compiler optimization but it is not really my area.
Any comments, reviews will be highly appreciated.
Thank you.
Since you are asking about "code optimization" - a good optimizing compiler will optimize this code down to void main(){}. a and b will be completely eliminated.
Also, 7/2 == 3, not 3.5
Compiling will translate the high-level language into the lower language, such as assembly. A good compiler may optimize, and this can be customizable (for example with -O2) option or so.
Regarding your code, double b=7/2; will yield 3.0 instead of 3.5, because you do an integer and integer operation. If you would like to have 3.5, you should do it like double b=7.0/2.0;. This is a quite common mistake that people do.
What will happen when I press compile?
Nobody knows. The compiler may optimize it to a constant, or it may not. It probably will, but it isn't required to.
You generally shouldn't worry or really even think about compiler optimization, unless you're in a position that absolutely needs it, which very few developers are. The compiler can usually do a better job than you can.
It's compiler-dependent, a good one will do CF and/or DCE
I don't know anything about optimization either, but I decided to give this a shot. Using, gcc -c -S test.c I got the assembly for the function. Here's what the line int a = 3 + 2 comes out as.
movl $5, -4(%rbp)
So for me, it's converting the value (3+2) to 5 at compile time, but it depends on the compiler and platform and whatever flags you pass it.
(Also, I made the function return a just so that it wouldn't optimize the code out entirely.)
I am getting:
warning: assuming signed overflow does not occur when assuming that (X + c) < X is always false [-Wstrict-overflow]
on this line:
if ( this->m_PositionIndex[in] < this->m_EndIndex[in] )
m_PositionIndex and m_EndIndex of type itk::Index (http://www.itk.org/Doxygen/html/classitk_1_1Index.html), and their operator[] returns a signed long.
(it is line 37 here: https://github.com/Kitware/ITK/blob/master/Modules/Core/Common/include/itkImageRegionConstIteratorWithIndex.hxx for context)
Can anyone explain what would cause that warning here? I don't see the pattern (x+c) < x anywhere - as this is simply a signed long comparison.
I tried to reproduce it in a self-contained example:
#include <iostream>
namespace itk
{
struct Index
{
signed long data[2];
Index()
{
data[0] = 0;
data[1] = 0;
}
signed long& operator[](unsigned int i)
{
return data[i];
}
};
}
int main (int argc, char *argv[])
{
itk::Index positionIndex;
itk::Index endIndex;
for(unsigned int i = 0; i < 2; i++)
{
positionIndex[i]++;
if ( positionIndex[i] < endIndex[i] )
{
std::cout << "something" << std::endl;
}
}
return 0;
}
but I do not get the warning there. Any thoughts as to what is different between my demo and the real code, or what could be causing the warning in the real code? I get the warning with both gcc 4.7.0 and 4.7.2 with the -Wall flag.
To simply disable this warning, use -Wno-strict-overflow. To instead disable the specific optimization that triggers this warning, use -fno-strict-overflow or -fwrapv.
The gcc manpage describes that this warning can be controlled with levels: -Wstrict-overflow=n.
If this is stopping your build due to -Werror, you can work-around without hiding the warnings by using -Wno-error=strict-overflow (or just -Wno-error to override -Werror).
Analysis and commentary...
I got the same warning and spent a couple of hours trying to reproduce it in a smaller example, but never succeeded. My real code involved calling an inline function in a templated class, but the algorithm simplifies to the following...
int X = some_unpredictable_value_well_within_the_range_of_int();
for ( int c=0; c<4; c++ ) assert( X+c >= X ); ## true unless (X+c) overflows
In my case the warning was somehow correlated with the optimizer unrolling the for loop, so I was able to work-around by declaring volatile int c=0. Another thing that fixed it was to declare unsigned int c=0, but I'm not exactly sure why that makes a difference. Another thing that fixed it was making the loop count large enough that the loop wouldn't be unrolled, but that's not a useful solution.
So what is this warning really saying? Either it is saying that the optimizer has modified the semantics of your algorithm (by assuming no overflow), or it is simply informing you that the optimizer is assuming that your code doesn't have the undefined behavior of overflowing a signed integer. Unless overflowing signed integers is part of the intended behavior of your program, this message probably does not indicate a problem in your code -- so you will likely want to disable it for normal builds. If you get this warning but aren't sure about the code in question, it may be safest to just disable the optimization with -fwrapv.
By the way, I ran into this issue on GCC 4.7, but the same code compiled without warning using 4.8 -- perhaps indicating that the GCC developers recognized the need to be a bit less "strict" in the normal case (or maybe it was just due to differences in the optimizer).
Philosophy...
In [C++11:N3242$5.0.4], it states...
If during the evaluation of an expression, the result is not
mathematically defined or not in the range of representable values for
its type, the behavior is undefined.
This means that a conforming compiler can simply assume that overflow never occurs (even for unsigned types).
In C++, due to features such as templates and copy constructors, elision of pointless operations is an important optimizer capability. Sometimes though, if you are using C++ as a low-level "system language", you probably just want the compiler to do what you tell it to do -- and rely on the behavior of the underlying hardware. Given the language of the standard, I'm not sure how to achieve this in a compiler-independent fashion.
The compiler is telling you that it has enough static knowledge of that snippet to know that test will always succeed if it can optimize the test assuming that no signed operation will overflow.
In other words, the only way x + 1 < x will ever return true when x is signed is if x is already the maximum signed value. [-Wstrict-overflow] let's the compiler warn when it can assume that no signed addition will overflow; it's basically telling you it's going to optimize away the test because signed overflow is undefined behavior.
If you want to suppress the warning, get rid of the test.
Despite the age of your question, since you didn't change that part of your code yet, I assume the problem still exists and that you still didn't get a useful explanation on what's actually going on here, so let my try it:
In C (and C++), if adding two SIGNED integers causes an overflow, the behavior of the entire program run is UNDEFINED. So, the environment that executes your program can do whatever it wants (format your hard disk or start a nuklear war, assuming the necessary hardware is present).
gcc usually does neither, but it does other nasty things (that could still lead to either one in unlucky circumstances). To demonstrate this, let me give you a simple example from Felix von Leitner # http://ptrace.fefe.de/int.c:
#include <assert.h>
#include <stdio.h>
int foo(int a) {
assert(a+100 > a);
printf("%d %d\n",a+100,a);
return a;
}
int main() {
foo(100);
foo(0x7fffffff);
}
Note: I added stdio.h, to get rid of the warning about printf not being declared.
Now, if we run this, we would expect the code to assert out on the second call of foo, because it creates an integer overflow and checks for it. So, let's do it:
$ gcc -O3 int.c -o int && ./int
200 100
-2147483549 2147483647
WTF? (WTF, German for "Was täte Fefe" - "what would Fefe do", Fefe being the nick name of Felix von Leitner, which I borrowed the code example from). Oh, and, btw, Was täte Fefe? Right: Write a bug report in 2007 about this issue! https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475
Back to your question. If you now try to dig down, you could create an assembly output (-S) and investigate only to figure out, that the assert was completely removed
$ gcc -S -O3 int.c -o int.s && cat int.s
[...]
foo:
pushq %rbx // save "callee-save" register %rbx
leal 100(%rdi), %edx // calc a+100 -> %rdx for printf
leaq .LC0(%rip), %rsi // "%d %d\n" for printf
movl %edi, %ebx // save `a` to %rbx
movl %edi, %ecx // move `a` to %rcx for printf
xorl %eax, %eax // more prep for printf
movl $1, %edi // and even more prep
call __printf_chk#PLT // printf call
movl %ebx, %eax // restore `a` to %rax as return value
popq %rbx // recover "callee-save" %rbx
ret // and return
No assert here at any place.
Now, let's turn on warnings during compilation.
$ gcc -Wall -O3 int.c -o int.s
int.c: In function 'foo':
int.c:5:2: warning: assuming signed overflow does not occur when assuming that (X + c) >= X is always true [-Wstrict-overflow]
assert(a+100 > a);
^~~~~~
So, what this message actually says is: a + 100 could potentially overflow causing undefined behavior. Because you are a highly skilled professional software developer, who never does anything wrong, I (gcc) know for sure, that a + 100 will not overflow. Because I know that, I also know, that a + 100 > a is always true. Because I know that, I know that the assert never fires. And because I know that, I can eliminate the entire assert in 'dead-code-elimination' optimization.
And that is exactly, what gcc does here (and warns you about).
Now, in your small example, the data flow analysis can determine, that this integer in fact does not overflow. So, gcc does not need to assume it to never overflow, instead, gcc can prove it to never overflow. In this case, it's absolutely ok to remove the code (the compiler could still warn about dead code elimination here, but dce happen so often, that probably nobody wants those warnings). But in your "real world code", the data flow analysis fails, because not all necessary information is present for it to take place. So, gcc doesn't know, whether a++ overflows. So it warns you, that it assumes that never happens (and then gcc removes the entire if statement).
One way to solve (or hide !!!) the issue here would be to assert for a < INT_MAX prior to doing the a++. Anyway, I fear, you might have some real bug there, but I would have to investigate much more to figure it out. However, you can figure it out yourself, be creating your MVCE the proper way: Take that source code with the warning, add anything necessary from include files to get the source code stand-alone (gcc -E is a little bit extreme but would do the job), then start removing anything that doesn't make the warning disappear until you have a code where you can't remove anything anymore.
In my case, this was a good warning from the compiler. I had a stupid copy/paste bug where I had a loop inside a loop, and each was the same set of conditions. Something like this:
for (index = 0; index < someLoopLimit; index++)
{
// blah, blah, blah.....
for (index = 0; index < someLoopLimit; index++)
{
// More blah, blah, blah....
}
}
This warning saved me time debugging. Thanks, gcc!!!
This is almost certainly an error, even though I cannot tell which of three possible errors it is.
I believe, that gcc is somehow figuring out what the values in 'this->m_PositionIndex[in]' etc. are computed from, namely that both values are derived from the same value, one with an offset it can also prove to be positive. Consequently it figures, that one branch of the if/else construct is unreachable code.
This is bad. Definitely. Why? Well, there are exactly three possibilities:
gcc is right in that the code is unreachable, and you overlooked that this is the case. In this case you simply have bloated code. Best of the three possibilities, but still bad in terms of code quality.
gcc is right in that the code is unreachable, but you wanted it to be reachable. In this case, you need to take a second look at why it is not reachable and fix the bug in your logic.
gcc is wrong, you have stumbled on a bug in it. Quite unlikely, but possible, and unfortunately very hard to prove.
The really bad thing is, that you absolutely need to either find out exactly why the code is unreachable, or you need to disprove that gcc is right (which is easier said than done). According to Murphy's law, assuming case 1. or 3. without proving it will make sure that it was case 2., and that you will have to hunt for the bug a second time...
I think that the compiler changed
positionIndex[i]++;
if ( positionIndex[i] < endIndex[i] )
into something more optimized like
if ( positionIndex[i]++ < endIndex[i] ) <-- see my comment below, this statement is wrong.
so
if ( positionIndex[i] + 1 < endIndex[i] )
that causes undefined behavior in case of overflow (thanks Seth).
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to get the length of a function in bytes?
I'm making a Hooking program that will be used to insert a method into the specified section of memory.
I need to get the length of a local C++ function, I've used a cast to get the location of a function, but how would I get the length?
would
int GetFuncLen()
{
int i = 0;
while((DWORD*)Function+i<max)
{
if((DWORD*)Function+i==0xC3)
{
return i;
}
i++;
}
}
work?
Your code seems to be operating system, compiler, and machine architecture specific.
(I know nothing about Windows)
It could be wrong if max is not defined.
It is operating system specific (probably Windows only) because DWORD is not a standard C++ type. You could use intptr_t (from <cstdint> header).
Your code is compiler specific, because you assume that every compiled function has a well defined unique end, and don't share any code with some other functions. (Some compilers are able to do such optimizations, and e.g. make two functions sharing a common epilogue or code chunk, using jump instructions).
Your code is machine specific, because you assume that the last instruction would be a RET coded 0xC3 and this is specific to x86 & x86-64 (won't work on Alpha or ARM, on which Windows is rumored to have been or to be ported). Also, that byte could appear inside other instructions or inlined constants (as Mat commented).
I am not sure that the notion of where a binary function ends has a well defined meaning. But if it does, I would expect that the linker may know about it. On some systems, for example on Linux with ELF executable, the compiler and the linker produces the size of each function.
Perhaps you better need to find the symbol near to a given address. I don't know if Windows has such a functionality (on Linux, the dladdr GNU function from <dlfcn.h> could be useful). Perhaps your operating system provides an equivalent?
No. For a few reasons.
1) 0xC3 is only a 'ret' instruction if it is at the point where a new instruction is expected. There could easily be other instructions that include a 0xc3 byte within their operands, and you'd only get part of the code.
2) There can be multiple 'ret' instructions in any given function, depending on the compiler and it's settings. Again, you'd only get part of the function.
3) Functions often use constructs like "switch" statements, that use "jump tables" that are located AFTER the ret instruction. Again, you'd only get part of the function.
And what you're trying to do is not likely to work anyway.
The biggest problem is that various assembly instructions will often reference specific areas of memory by using offsets rather than absolute addresses. So while extremely minimal functions might work, any functions that call out into other functions will likely fail.
Assuming you're trying to load these functions into an external process, and you're trying to do this on Windows, a better solution is to use DLL injection to load a DLL into your target process.
If you really need to inject the memory, then you'll need an assembly language parser for your particular platform to update all of the memory addresses for the relevant instructions, which is a very complex task. Or you could write your functions in assembly language and make sure that you're not using relative offsets for anything other than referencing parts of your own code, which is a bit easier, but more limiting in what you can do.
You could force your function to be put in a section all by itself (see eg http://msdn.microsoft.com/en-us/library/s20kdbse(v=VS.71).aspx).
I think that if you define a section, declare a variable in it, define your function in it, then define another variable in it then the addresses of the two variables will cover your function.
Better is to put the two variables and the function in separate sections and then use section merging to control the order they appear in the resulting code (see How to refer to the start-of a user-defined segment in a Visual Studio-project?)
As others have pointed out you probably can't do anything useful with this, and it's not at all portable.
The only reliable way to do this is to compile your code with a dummy number for the length of the function (but not run it), disassemble it, and calculate the length by hand, then take that number and substitute it for the dummy number, and recompile your program.
When I needed to do this, I just made a guess as to how big the function should be. As long as your guess is not to small (and not way way too big) you should have no problems.
You can use
objdump
to get the size of objects with external linkage. Otherwise, you could take the assembly output of the compiler (gcc -S, e.g.) and assemble it manually, you'll have the opportunity to see what names the length fields get:
.file "test.cpp"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.6.0-3~ppa1) 4.6.1 20110409 (prerelease)"
.section .note.GNU-stack,"",#progbits
See the .size main, .-main evaluation: it calculates the function size
Here's a simplified version of some code I'm working with right now:
int Do_A_Thing(void)
{
return(42);
}
void Some_Function(void)
{
int (*fn_do_thing)(void) = Do_A_Thing;
fn_do_thing();
}
When I compile this in xcode 4.1, the assembler it generates sets the fn_do_thing variable like so:
0x2006: calll 0x200b ;
0x200b: popl %eax ; get EIP
0x200c: movl 1333(%eax), %eax
I.e. it generates a relative address for the place to find the Do_A_Thing function - "current instruction plus 1333", which according to the map file is a "non-lazy pointer" to the function.
When I compile similar code on Windows with Visual Studio, windows generates a fixed address instead of doing it relatively like this. If Do_A_Thing lives at, for example, 0x40050914, it just sets fn_do_thing to 0x40050914. By contrast, xcode sets it to "where I am + some offset".
Is there any way to make xcode generate an absolute address to set the function pointer to, like visual studio does? Or is there some reason that wouldn't work? I have noticed that every time I run the program, the Do_A_Thing function (and all other functions) seem to load at a different address.
You're looking at position independent code (more specifically Position Independent Executable) in action. This, as you noticed, allows the OS to load the binary anywhere in memory, which provides numerous security improvements for potentially insecure code.
You can disable it via removing a linker option in XCode (-Wl,-pie).
Note that on x86_64 (amd64), instructions can operate relative to the instruction pointer, which improves the efficiency of this technique (and makes it basically "free" in performance cost).
Let's say I have the following in C or C++:
#include <math.h>
#define ROWS 15
#define COLS 16
#define COEFF 0.15
#define NODES (ROWS*COLS)
#define A_CONSTANT (COEFF*(sqrt(NODES)))
Then, I go and use NODES and A_CONSTANT somewhere deep within many nested loops (i.e. used many times). Clearly, both have numeric values that can be ascertained at compile-time, but do compilers actually do it? At run-time, will the CPU have to evaluate 15*16 every time it sees NODES, or will the compiler statically put 240 there? Similarly, will the CPU have to evaluate a square root every time it sees A_CONSTANT?
My guess is that the ROWS*COLS multiplication is optimized out but nothing else is. Integer multiplication is built into the language but sqrt is a library function. If this is indeed the case, is there any way to get a magic number equivalent to A_CONSTANT such that the square root is evaluated only once at run-time?
Macro definitions are expanded by simple textual substitution into the source code before it's handed to the compiler proper, which may do optimization. A compiler will generate exactly the same code for the expressions NODES, ROWS*COLS and 15*16 (and I can't think of a single one that will do the multiplication every time round the loop with optimization enabled).
As for A_CONSTANT, the fact that it is a macro again doesn't matter; what matters is whether the compiler is smart enough to figure out that sqrt of a constant is a constant (assuming that's sqrt from <math.h>). I know GCC is smart enough and I expect other production-quality compilers to be smart enough as well.
Anything in a #define is inserted into the source as a pre-compile step which means that once the code is compiled the macros have basically disappeared and the code is compiled as usual. Whether or not it is optimized depends on your code, compiler and complier settings.
It depends on your compiler.
#include <math.h>
#define FOO sqrt(5);
double
foo()
{
return FOO;
}
My compiler (gcc 4.1.2) generates the following assembly for this code:
.LC0:
.long 2610427048
.long 1073865591
.text
.p2align 4,,15
.globl foo
.type foo, #function
foo:
.LFB2:
movsd .LC0(%rip), %xmm0
ret
.LFE2:
So it does know that sqrt(5) is a compile-time constant.
If your compiler is not so smart, I do not know of any portable way to compute a square root at compile time. (Of course, you can compute the result once and store it in a global or whatever, but that is not the same thing as a compile-time constant.)
There's really two questions here:
Does the compiler optimize expressions found inside macros?
Does the compiler optimize sqrt()?
(1) is easy: Yes, it does. The preprocessor is seperate from the C compiler, and does its thing before the C compiler even starts. So if you have
#define ROWS 15
#define COLS 16
#define NODES (ROWS*COLS)
void foo( )
{
int data[ROWS][COLS];
printf( "I have %d pieces of data\n", NODES );
for ( int *i = data; i < data + NODES ; ++i )
{
printf("%d ", *i);
}
}
The compiler will actually see:
void foo( )
{
int data[15][16];
printf( "I have %d pieces of data\n", (15*16) );
for ( int *i = data; i < data + (15*16) ; ++i )
{
printf("%d ", *i);
}
}
And that is subject to all the usual compile-time constant optimization.
sqrt() is trickier because it varies from compiler to compiler. In most modern compilers, sqrt() is actually a compiler intrinsic rather than a library function — it looks like a function call, but it is actually a special case inside the compiler that has additional heuristics based on mathematical laws, hardware ops, etc. In smart compilers where sqrt() is such a special case, sqrt() of a constant value will be translated internally to a constant number. In stupid compilers, it will result in a function call each time. The only way to know which you're getting is to compile the code and look at the emitted assembly.
From what I've seen, MSVC, modern GCC, Intel, IBM, and SN all handle sqrt as intrinisc. Old GCC and some crappy vendor-supplied compilers for embedded chips do not.
#defines are handled before compilation, with simple text replacement. The resulting text file is then passed to the actual compilation step.
If you are using gcc, try compiling a source file with the -E switch, which will do the preprocessing and then stop. Look at the generated file to see the actual input to the compilation step.
The macro will be substituted, and then the code compiled like the rest of the code. If you've turned on optimization (and the compiler you're using does decent optimization) you can probably expect things like this to be computed at compile time.
To put that in perspective, there are relatively few C++ compilers old enough that you'd expect them to lack optimization like that. Compilers old enough to lack that simple of optimization will generally be C only (and even then, don't count on it -- definitely things like MS C 5.0/5.1/6.0, Datalight/Zortech C, Borland, etc., did this as well. From what I recall, the C compilers that ran on CP/M mostly didn't though.