Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have two variable a and b.I have to write a if condition on variable a and b:
This is First Approach:
if(a > 0 || b >0){
//do some things
}
This is second Approach:
if((a+b) > 0){
//do some thing
}
Update: consider a and b are unsigned.then which will take lesser execution time between logical or(||) and arithmetic (+ )operator
this condition will Iterate around one million times.
Any help on this will be appreciated.
Your second condition is wrong. If a=1, b=-1000, it will evaluate to false, whereas your first condition will be evaluated to true. In general you shouldn't worry about speed at these kind of tests, the compiler optimizes the condition a lot, so a logical OR is super fast. In general, people are making bigger mistakes than optimizing such conditions... So don't try to optimize unless you really know what is going on, the compiler is in general doing a much better job than any of us.
In principle, in the first expression you have 2 CMP and one OR, whereas in the second, you have only one CMP and one ADD, so the second should be faster (even though the complier does some short-circuit in the first case, but this cannot happen 100% of the time), however in your case the expressions are not equivalent (well, they are for positive numbers...).
I decided to check this for C language, but identical arguments apply to C++, and similar arguments apply to Java (except Java allows signed overflow). Following code was tested (for C++, replace _Bool with bool).
_Bool approach1(int a, int b) {
return a > 0 || b > 0;
}
_Bool approach2(int a, int b) {
return (a + b) > 0;
}
And this was resulting disasembly.
.file "faster.c"
.text
.p2align 4,,15
.globl approach1
.type approach1, #function
approach1:
.LFB0:
.cfi_startproc
testl %edi, %edi
setg %al
testl %esi, %esi
setg %dl
orl %edx, %eax
ret
.cfi_endproc
.LFE0:
.size approach1, .-approach1
.p2align 4,,15
.globl approach2
.type approach2, #function
approach2:
.LFB1:
.cfi_startproc
addl %esi, %edi
testl %edi, %edi
setg %al
ret
.cfi_endproc
.LFE1:
.size approach2, .-approach2
.ident "GCC: (SUSE Linux) 4.8.1 20130909 [gcc-4_8-branch revision 202388]"
.section .note.GNU-stack,"",#progbits
Those codes are quite different, even considering how clever the compilers are these days. Why is that so? Well, the reason is quite simple - they aren't identical. If a is -42 and b is 2, the first approach will return true, and second will return false.
Surely, you may think that a and b should be unsigned.
.file "faster.c"
.text
.p2align 4,,15
.globl approach1
.type approach1, #function
approach1:
.LFB0:
.cfi_startproc
orl %esi, %edi
setne %al
ret
.cfi_endproc
.LFE0:
.size approach1, .-approach1
.p2align 4,,15
.globl approach2
.type approach2, #function
approach2:
.LFB1:
.cfi_startproc
addl %esi, %edi
testl %edi, %edi
setne %al
ret
.cfi_endproc
.LFE1:
.size approach2, .-approach2
.ident "GCC: (SUSE Linux) 4.8.1 20130909 [gcc-4_8-branch revision 202388]"
.section .note.GNU-stack,"",#progbits
It's quite easy to notice that approach1 is better here, because it doesn't do pointless addition, which is in fact, quite wrong. In fact, it even makes an optimization to (a | b) != 0, which is correct optimization.
In C, unsigned overflows are defined, so the compiler has to handle the case when integers are very high (try INT_MAX and 1 for approach2). Even assuming you know the numbers won't overflow, it's easy to notice approach1 is faster, because it simply tests if both variables are 0.
Trust your compiler, it will optimize better than you, and that without small bugs that you could accidentally write. Write code instead of asking yourself whether i++ or ++i is faster, or if x >> 1 or x / 2 is faster (by the way, x >> 1 doesn't do the same thing as x / 2 for signed numbers, because of rounding behavior).
If you want to optimize something, optimize algorithms you use. Instead of using worst case O(N4) sorting algorithm, use worst case O(N log N) algorithm. This will actually make program faster, especially if you sort reasonably big arrays
The real answer for this is always to do both and actually test which one runs faster. That's the only way to know for sure.
I would guess the second one would run faster, because an add is a quick operation but a missed branch causes pipeline clears and all sort of nasty things. It would be data dependent though. But it isn't exactly the same, if a or b is allowed to be negative or big enough for overflow then it isn't the same test.
Well, I wrote some quick code and disassembled:
public boolean method1(final int a, final int b) {
if (a > 0 || b > 0) {
return true;
}
return false;
}
public boolean method2(final int a, final int b) {
if ((a + b) > 0) {
return true;
}
return false;
}
These produce:
public boolean method1(int, int);
Code:
0: iload_1
1: ifgt 8
4: iload_2
5: ifle 10
8: iconst_1
9: ireturn
10: iconst_0
11: ireturn
public boolean method2(int, int);
Code:
0: iload_1
1: iload_2
2: iadd
3: ifle 8
6: iconst_1
7: ireturn
8: iconst_0
9: ireturn
So as you can see, they're pretty similar; the only difference is performing a > 0 test vs a + b; looks like the || got optimized away. What the JIT compiler optimizes these to, I have no clue.
If you wanted to get really picky:
Option 1: Always 1 load and 1 comparison, possible 2 loads and 2 comparisons
Option 2: Always 2 loads, 1 addition, 1 comparison
So really, which one performs better depends on what your data looks like and whether there is a pattern the branch predictor can use. If so, I could imagine the first method running faster because the processor basically "skips" the checks, and in the best case only has to perform half the operations the second option will. To be honest, though, this really seems like premature optimization, and I'm willing to bet that you're much more likely to get more improvement elsewhere in your code. I don't find basic operations to be bottlenecks most of the time.
Two things:
(a|b) > 0 is strictly better than (a+b) > 0, so replace it.
The above two only work correctly if the numbers are both unsigned.
If a and b have the potential to be negative numbers, the two choices are not equivalent, as has been pointed out by the answer by #vsoftco.
If both a and b are guaranteed to be non-negative integers, I would use
if ( (a|b) > 0 )
instead of
if ( (a+b) > 0 )
I think bitwise | is faster than integer addition.
Update
Use bitwise | instead of &.
Related
EDIT: Indeed, I had a weird error in my timing code leading to these results. When I fixed my error, the smart version ended up faster as expected. My timing code looked like this:
bool x = false;
before = now();
for (int i=0; i<N; ++i) {
x ^= smart_xor(A[i],B[i]);
}
after = now();
I had done the ^= to discourage my compiler from optimizing the for-loop away. But I think that the ^= somehow interacts strangely with the two xor functions. I changed my timing code to simply fill out an array of the xor results, and then do computation with that array outside of the timed code. And that fixed things.
Should I delete this question?
END EDIT
I defined two C++ functions as follows:
bool smart_xor(bool a, bool b) {
return a^b;
}
bool dumb_xor(bool a, bool b) {
return a?!b:b;
}
My timing tests indicate that dumb_xor() is slightly faster (1.31ns vs 1.90ns when inlined, 1.92ns vs 2.21ns when not inlined). This puzzles me, as the ^ operator should be a single machine operation. I'm wondering if anyone has an explanation.
The assembly looks like this (when not inlined):
.file "xor.cpp"
.text
.p2align 4,,15
.globl _Z9smart_xorbb
.type _Z9smart_xorbb, #function
_Z9smart_xorbb:
.LFB0:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
movl %esi, %eax
xorl %edi, %eax
ret
.cfi_endproc
.LFE0:
.size _Z9smart_xorbb, .-_Z9smart_xorbb
.p2align 4,,15
.globl _Z8dumb_xorbb
.type _Z8dumb_xorbb, #function
_Z8dumb_xorbb:
.LFB1:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
movl %esi, %edx
movl %esi, %eax
xorl $1, %edx
testb %dil, %dil
cmovne %edx, %eax
ret
.cfi_endproc
.LFE1:
.size _Z8dumb_xorbb, .-_Z8dumb_xorbb
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
I'm using g++ 4.4.3-4ubuntu5 on an Intel Xeon X5570. I compiled with -O3.
I don't think you benchmarked your code correctly.
We can see in the generated assembly that your smart_xor function is:
movl %esi, %eax
xorl %edi, %eax
while your dumb_xor function is:
movl %esi, %edx
movl %esi, %eax
xorl $1, %edx
testb %dil, %dil
cmovne %edx, %eax
So obviously, the first one will be faster.
If not, then you have benchmarking issues.
So you may want to tune your benchmarking code... And remember you'll need to run a lot of calls to have a good and meaningful average.
Given that your "dumb XOR" code is significantly longer (and most instructions are dependent on a previous one, so it won't run in parallel), I suspect that you have some sort of measurement error in your results.
The compiler will need to produce two instructions for the out-of-line version of "smart XOR" because the registers that the data comes in as is not the register to give the return result in, so the data has to move from EDI and ESI to EAX. In an inline version, the code should be able to use whatever register the data is in before the call, and if the code allows it, result stays in the register it came in as.
Calling a function is out-of-line is probably at least as long in execution time as the actual code in the function.
It would help if you showes your test-harness that you use for benchmarking too...
Consider the following:
inline unsigned int f1(const unsigned int i, const bool b) {return b ? i : 0;}
inline unsigned int f2(const unsigned int i, const bool b) {return b*i;}
The syntax of f2 is more compact, but do the standard guarantees that f1 and f2 are strictly equivalent ?
Furthermore, if I want the compiler to optimize this expression if b and i are known at compile-time, which version should I prefer ?
Well, yes, both are equivalent. bool is an integral type and true is guaranteed to convert to 1 in integer context, while false is guaranteed to convert to 0.
(The reverse is also true, i.e. non-zero integer values are guaranteed to convert to true in boolean context, while zero integer values are guaranteed to convert to false in boolean context.)
Since you are working with unsigned types, one can easily come up with other, possibly bit-hack-based yet perfectly portable implementations of the same thing, like
i & -(unsigned) b
although a decent compiler should be able to choose the best implementation by itself for any of your versions.
P.S. Although to my great surprise, GCC 4.1.2 compiled all three variants virtually literally, i.e. it used machine multiplication instruction in multiplication-based variant. It was smart enough to use cmovne instruction on the ?: variant to make it branchless, which quite possibly made it the most efficient implementation.
Yes. It's safe to assume true is 1 and false is 0 when used in expressions as you do and is guaranteed:
C++11, Integral Promotions, 4.5:
An rvalue of type bool can be converted to an rvalue of type int, with
false becoming zero and true becoming one.
The compiler will use implicit conversion to make an unsigned int from b, so, yes, this should work. You're skipping the condition checking by simple multiplication. Which one is more effective/faster? Don't know. A good compiler would most likely optimize both versions I'd assume.
FWIW, the following code
inline unsigned int f1(const unsigned int i, const bool b) {return b ? i : 0;}
inline unsigned int f2(const unsigned int i, const bool b) {return b*i;}
int main()
{
volatile unsigned int i = f1(42, true);
volatile unsigned int j = f2(42, true);
}
compiled with gcc -O2 produces this assembly:
.file "test.cpp"
.def ___main; .scl 2; .type 32; .endef
.section .text.startup,"x"
.p2align 2,,3
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB2:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
call ___main
movl $42, 8(%esp) // i
movl $42, 12(%esp) // j
xorl %eax, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE2:
There's not much left of either f1 or f2, as you can see.
As far as C++ standard is concerned, the compiler is allowed to do anything with regards to optimization, as long as it doesn't change the observable behaviour (the as if rule).
When I initialize float variables in my program, I commonly have vectors like:
Vector forward(0.f,0.f,-1.f),right(1.f,0.f,0.f),up(0.f,1.f,0.f)
(Vectors are just 3 floats like struct Vector{ float x,y,z; };)
This looks much easier to read as:
Vector forward(0,0,-1),right(1,0,0),up(0,1,0)
Must I initialize my float variables using floats? Am I losing anything or incurring some kind of penalty when I use integers (or doubles) to initialize a float?
There's no semantic difference between the two. Depending on some compilers, it is possible for extra code to be generated, though. See also this and this SO questions of the same topic.
I can confirm that gcc generates the same code for all variants of
int main()
{
float a = 0.0f; /* or 0 or 0.0 */
return 0;
}
and that this code is
.file "1.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $0x00000000, %eax
movl %eax, -4(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",#progbits
The relevant line is
movl $0x00000000, %eax
Changing a to 0.1 (or 0.1f) changes the line to
movl $0x3dcccccd, %eax
It seems that gcc is able to deduce the correct constant and doesn't generate extra code.
For a single literal constant, it shouldn't matter. In the context of an initializer, a constant of any numeric type will be implicitly converted to the type of the object being initialized. This is guaranteed by the language standard. So all of these:
float x = 0;
float x = 0.0;
float x = 0.0f;
float x = 0.0L; // converted from long double to float
are equally valid and result in the same value being stored in x.
A literal constant in a more complex expression can have surprising results, though.
In most cases, each expression is evaluated by itself, regardless of the context in which it appears. Any implicit conversion is applied after the subexpression has been evaluated.
So if you write:
float x = 1 / 2;
the expression 1 / 2 will be evaluated as an int, yielding 0, which is then converted to float. It will setxto0.0f, not to0.5f`.
I think you should be safe using unsuffixed floating-point constants (which are of type double).
Incidentally, you might consider using double rather than float in your program. double, as I mentioned, is the type of an unsuffixed floating-point constant, and can be thought of in some sense as the "default" floating-point type. It usually has more range and precision than float, and there's typically not much difference in performance.
It could be a good programming practise to always write 0.f, 1.f etc., even if often gcc can figure out what the programmer means by 1.0 et al.
The problematic cases are not so much trivial float variable initializations, but numeric constants in somewhat more complex formulae, where a combination of operators, float variables and said constants can easily lead to occurrence of unintended double valued calculations and costly float-double-float conversions.
Spotting these conversions without specifically checking the compiled code for them becomes very hard if the intended type for numeric values is mostly omitted in the code and instead only included when it's absolutely required. Hence I for one would choose the approach of just typing in the f's and getting used to having them around.
Which value is better to use? Boolean true or Integer 1?
The above topic made me do some experiments with bool and int in if condition. So just out of curiosity I wrote this program:
int f(int i)
{
if ( i ) return 99; //if(int)
else return -99;
}
int g(bool b)
{
if ( b ) return 99; //if(bool)
else return -99;
}
int main(){}
g++ intbool.cpp -S generates asm code for each functions as follows:
asm code for f(int)
__Z1fi:
LFB0:
pushl %ebp
LCFI0:
movl %esp, %ebp
LCFI1:
cmpl $0, 8(%ebp)
je L2
movl $99, %eax
jmp L3
L2:
movl $-99, %eax
L3:
leave
LCFI2:
ret
asm code for g(bool)
__Z1gb:
LFB1:
pushl %ebp
LCFI3:
movl %esp, %ebp
LCFI4:
subl $4, %esp
LCFI5:
movl 8(%ebp), %eax
movb %al, -4(%ebp)
cmpb $0, -4(%ebp)
je L5
movl $99, %eax
jmp L6
L5:
movl $-99, %eax
L6:
leave
LCFI6:
ret
Surprisingly, g(bool) generates more asm instructions! Does it mean that if(bool) is little slower than if(int)? I used to think bool is especially designed to be used in conditional statement such as if, so I was expecting g(bool) to generate less asm instructions, thereby making g(bool) more efficient and fast.
EDIT:
I'm not using any optimization flag as of now. But even absence of it, why does it generate more asm for g(bool) is a question for which I'm looking for a reasonable answer. I should also tell you that -O2 optimization flag generates exactly same asm. But that isn't the question. The question is what I've asked.
Makes sense to me. Your compiler apparently defines a bool as an 8-bit value, and your system ABI requires it to "promote" small (< 32-bit) integer arguments to 32-bit when pushing them onto the call stack. So to compare a bool, the compiler generates code to isolate the least significant byte of the 32-bit argument that g receives, and compares it with cmpb. In the first example, the int argument uses the full 32 bits that were pushed onto the stack, so it simply compares against the whole thing with cmpl.
Compiling with -03 gives the following for me:
f:
pushl %ebp
movl %esp, %ebp
cmpl $1, 8(%ebp)
popl %ebp
sbbl %eax, %eax
andb $58, %al
addl $99, %eax
ret
g:
pushl %ebp
movl %esp, %ebp
cmpb $1, 8(%ebp)
popl %ebp
sbbl %eax, %eax
andb $58, %al
addl $99, %eax
ret
.. so it compiles to essentially the same code, except for cmpl vs cmpb.
This means that the difference, if there is any, doesn't matter. Judging by unoptimized code is not fair.
Edit to clarify my point. Unoptimized code is for simple debugging, not for speed. Comparing the speed of unoptimized code is senseless.
When I compile this with a sane set of options (specifically -O3), here's what I get:
For f():
.type _Z1fi, #function
_Z1fi:
.LFB0:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
cmpl $1, %edi
sbbl %eax, %eax
andb $58, %al
addl $99, %eax
ret
.cfi_endproc
For g():
.type _Z1gb, #function
_Z1gb:
.LFB1:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
cmpb $1, %dil
sbbl %eax, %eax
andb $58, %al
addl $99, %eax
ret
.cfi_endproc
They still use different instructions for the comparison (cmpb for boolean vs. cmpl for int), but otherwise the bodies are identical. A quick look at the Intel manuals tells me: ... not much of anything. There's no such thing as cmpb or cmpl in the Intel manuals. They're all cmp and I can't find the timing tables at the moment. I'm guessing, however, that there's no clock difference between comparing a byte immediate vs. comparing a long immediate, so for all practical purposes the code is identical.
edited to add the following based on your addition
The reason the code is different in the unoptimized case is that it is unoptimized. (Yes, it's circular, I know.) When the compiler walks the AST and generates code directly, it doesn't "know" anything except what's at the immediate point of the AST it's in. At that point it lacks all contextual information needed to know that at this specific point it can treat the declared type bool as an int. A boolean is obviously by default treated as a byte and when manipulating bytes in the Intel world you have to do things like sign-extend to bring it to certain widths to put it on the stack, etc. (You can't push a byte.)
When the optimizer views the AST and does its magic, however, it looks at surrounding context and "knows" when it can replace code with something more efficient without changing semantics. So it "knows" it can use an integer in the parameter and thereby lose the unnecessary conversions and widening.
With GCC 4.5 on Linux and Windows at least, sizeof(bool) == 1. On x86 and x86_64, you can't pass in less than an general purpose register's worth to a function (whether via the stack or a register depending on the calling convention etc...).
So the code for bool, when un-optimized, actually goes to some length to extract that bool value from the argument stack (using another stack slot to save that byte). It's more complicated than just pulling a native register-sized variable.
Yeah, the discussion's fun. But just test it:
Test code:
#include <stdio.h>
#include <string.h>
int testi(int);
int testb(bool);
int main (int argc, char* argv[]){
bool valb;
int vali;
int loops;
if( argc < 2 ){
return 2;
}
valb = (0 != (strcmp(argv[1], "0")));
vali = strcmp(argv[1], "0");
printf("Arg1: %s\n", argv[1]);
printf("BArg1: %i\n", valb ? 1 : 0);
printf("IArg1: %i\n", vali);
for(loops=30000000; loops>0; loops--){
//printf("%i: %i\n", loops, testb(valb=!valb));
printf("%i: %i\n", loops, testi(vali=!vali));
}
return valb;
}
int testi(int val){
if( val ){
return 1;
}
return 0;
}
int testb(bool val){
if( val ){
return 1;
}
return 0;
}
Compiled on a 64-bit Ubuntu 10.10 laptop with:
g++ -O3 -o /tmp/test_i /tmp/test_i.cpp
Integer-based comparison:
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m8.203s
user 0m8.170s
sys 0m0.010s
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m8.056s
user 0m8.020s
sys 0m0.000s
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m8.116s
user 0m8.100s
sys 0m0.000s
Boolean test / print uncommented (and integer commented):
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m8.254s
user 0m8.240s
sys 0m0.000s
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m8.028s
user 0m8.000s
sys 0m0.010s
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m7.981s
user 0m7.900s
sys 0m0.050s
They're the same with 1 assignment and 2 comparisons each loop over 30 million loops. Find something else to optimize. For example, don't use strcmp unnecessarily. ;)
At the machine level there is no such thing as bool
Very few instruction set architectures define any sort of boolean operand type, although there are often instructions that trigger an action on non-zero values. To the CPU, usually, everything is one of the scalar types or a string of them.
A given compiler and a given ABI will need to choose specific sizes for int and bool and when, like in your case, these are different sizes they may generate slightly different code, and at some levels of optimization one may be slightly faster.
Why is bool one byte on many systems?
It's safer to choose a char type for bool because someone might make a really large array of them.
Update: by "safer", I mean: for the compiler and library implementors. I'm not saying people need to reimplement the system type.
It will mostly depend on the compiler and the optimization. There's an interesting discussion (language agnostic) here:
Does "if ([bool] == true)" require one more step than "if ([bool])"?
Also, take a look at this post: http://www.linuxquestions.org/questions/programming-9/c-compiler-handling-of-boolean-variables-290996/
Approaching your question in two different ways:
If you are specifically talking about C++ or any programming language that will produce assembly code for that matter, we are bound to what code the compiler will generate in ASM. We are also bound to the representation of true and false in c++. An integer will have to be stored in 32 bits, and I could simply use a byte to store the boolean expression. Asm snippets for conditional statements:
For the integer:
mov eax,dword ptr[esp] ;Store integer
cmp eax,0 ;Compare to 0
je false ;If int is 0, its false
;Do what has to be done when true
false:
;Do what has to be done when false
For the bool:
mov al,1 ;Anything that is not 0 is true
test al,1 ;See if first bit is fliped
jz false ;Not fliped, so it's false
;Do what has to be done when true
false:
;Do what has to be done when false
So, that's why the speed comparison is so compile dependent. In the case above, the bool would be slightly fast since cmp would imply a subtraction for setting the flags. It also contradicts with what your compiler generated.
Another approach, a much simpler one, is to look at the logic of the expression on it's own and try not to worry about how the compiler will translate your code, and I think this is a much healthier way of thinking. I still believe, ultimately, that the code being generated by the compiler is actually trying to give a truthful resolution. What I mean is that, maybe if you increase the test cases in the if statement and stick with boolean in one side and integer in another, the compiler will make it so the code generated will execute faster with boolean expressions in the machine level.
I'm considering this is a conceptual question, so I'll give a conceptual answer. This discussion reminds me of discussions I commonly have about whether or not code efficiency translates to less lines of code in assembly. It seems that this concept is generally accepted as being true. Considering that keeping track of how fast the ALU will handle each statement is not viable, the second option would be to focus on jumps and compares in assembly. When that is the case, the distinction between boolean statements or integers in the code you presented becomes rather representative. The result of an expression in C++ will return a value that will then be given a representation. In assembly, on the other hand, the jumps and comparisons will be based in numeric values regardless of what type of expression was being evaluated back at you C++ if statement. It is important on these questions to remember that purely logicical statements such as these end up with a huge computational overhead, even though a single bit would be capable of the same thing.
I tried the code provided by this question,but it doesn't work.
How to contrive an overflow to wrap my head around?
Update:
.file "hw.cpp"
.section .rdata,"dr"
LC0:
.ascii "Oh shit really bad~!\15\12\0"
.text
.align 2
.globl __Z3badv
.def __Z3badv; .scl 2; .type 32; .endef
__Z3badv:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl $LC0, (%esp)
call _printf
leave
ret
.section .rdata,"dr"
LC1:
.ascii "WOW\0"
.text
.align 2
.globl __Z3foov
.def __Z3foov; .scl 2; .type 32; .endef
__Z3foov:
pushl %ebp
movl %esp, %ebp
subl $4, %esp
movl LC1, %eax
movl %eax, -4(%ebp)
movl $__Z3badv, 4(%ebp)
leave
ret
.def ___main; .scl 2; .type 32; .endef
.align 2
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
call __alloca
call ___main
call __Z3foov
movl $0, %eax
leave
ret
.def _printf; .scl 2; .type 32; .endef
It would help to compile the example in the other question to assembly so you can get a feel for how the stack is laid out for your given compiler and processor. The +8 in the example may not be the correct number for your environment. What you need to determine is where the return address is stored on the stack relative to the array stored on the stack.
By the way, the example worked for me. I compiled on Win XP with Cygwin, gcc version 4.3.4. When I say it "worked", I mean that it ran code in the bad() function, even though that function was never called by the code.
$ gcc -Wall -Wextra buffer-overflow.c && ./a.exe
Oh shit really bad~!
Segmentation fault (core dumped)
The code really isn't an example of a buffer overflow, it's an example of what bad things can happen when a buffer overflow is exploited.
I'm not great with x86 assembly, but here's my interpretation of how this exploit works.
$ gcc -S buffer-overflow.c && cat buffer-overflow.s
_foo:
pushl %ebp ;2
movl %esp, %ebp ;3
subl $16, %esp ;4
movl LC1, %eax ;5
movl %eax, -4(%ebp) ;6
leal -4(%ebp), %eax ;7
leal 8(%eax), %edx ;8
movl $_bad, %eax ;9
movl %eax, (%edx) ;10
leave
ret
_main:
...
call _foo ;1
...
When main calls foo (1), the call instruction pushes onto the stack the address within main to return to once the call to foo completes. Pushing onto the stack involves decrementing ESP and storing a value there.
Once in foo, the old base pointer value is also pushed onto the stack (2). This will be restored when foo returns. The stack pointer is saved as the base pointer for this stack frame (3). The stack pointer is decremented by 16 (4), which creates space on this stack frame for local variables.
The address of literal "WOW\0" is copied into local variable overme on the stack (5,6) -- this seems strange to me, shouldn't it be copying the 4 characters into space allocated on the stack? Anyway, the place where WOW (or a pointer to it) is copied is 4 bytes below the current base pointer. So the stack contains this value, then the old base pointer, then the return address.
The address of overme is put into EAX (7) and an integer pointer is created 8 bytes beyond that address (8). The address of the bad function is put into EAX (9) and then that address is stored in memory pointed to by the integer pointer (10).
The stack looks like this:
// 4 bytes on each row
ESP: (unused)
: (unused)
: (unused)
: &"WOW\0"
: old EBP from main
: return PC, overwritten with &bad
When you compile with optimization, all the interesting stuff gets optimized away as "useless code" (which it is).
$ gcc -S -O2 buffer-overflow.c && cat buffer-overflow.s
_foo:
pushl %ebp
movl %esp, %ebp
popl %ebp
ret
You can use the C example you posted. It works the same in C as C++.
The smallest readable answer I can think of:
int main() {
return ""[1]; // Undefined behaviour (reading past '\0' in string)
}
Something like this?
int main()
{
char arr[1];
arr[1000000] = 'a';
}
A simple buffer overflow would be something like this:
#include <stdio.h>
#include <string.h>
int main() {
char a[4] = {0};
char b[32] = {0};
printf("before: b == \"%s\"\n", b);
strcpy(a, "Putting too many characters in array a");
printf("after: b == \"%s\"\n", b);
}
A possible output:
before: b == ""
after: b == " characters in array a"
The actual behavior of the program is undefined, so the buffer overflow might also cause different output, crashes or no observable effect at all.
#define _WIN32_WINNT 0x0400
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
void process_msg(const char *pSrc)
{
char cBuff[5];
strcpy(cBuff, pSrc);
}
void main()
{
char szInput[] = "hello world!";
process_msg(szInput);
}
Running this program at Visual Studio 2008 in Debug mode gives this message:
Run-Time Check Failure #2 - Stack around the variable 'cBuff' was corrupted.
The 'cBuff' char array is allocated on the stack in this example and it's of 5 bytes in size. Copying the given pointer's data (pSrc) to that char arrat (cBuff) overwrites the stack frame's data which will result in a possible exploit.
This technique is used by hackers - they send a specially crafted array of chars which will overwrite the pointer to the "return" address, on the stack, and change it to their desired location at the memory.
So, for example, they could point that "return" address to any system/program code that will open a port or establish a connection, and then they get to you PC, with the application's privileges (many times this means root/administrator).
Read more at http://en.wikipedia.org/wiki/Buffer_overflow .
In addition to the excellent article pointed out by Eric, you might also check out the following reading materials:
Writing buffer overflow exploits - a tutorial for beginners
A step-by-step on the buffer overflow vulnerablity
The following article focuses more on heap overflows:
w00w00 on Heap Overflows
This was copied from my answer here.
The term buffer overflow precisely means accessing past the end of the buffer (sloppily needing to include the idea of buffer underflow, too). But there's a whole class of "memory safety problems" having to do with buffer overflows, pointers, arrays, allocation and deallocation, all of which can produce crashes and/or exploit opportunities in code. See this C example of another memory safety problem (and our way to detect it).