Inside a large loop, I currently have a statement similar to
if (ptr == NULL || ptr->calculate() > 5)
{do something}
where ptr is an object pointer set before the loop and never changed.
I would like to avoid comparing ptr to NULL in every iteration of the loop. (The current final program does that, right?) A simple solution would be to write the loop code once for (ptr == NULL) and once for (ptr != NULL). But this would increase the amount of code making it more difficult to maintain, plus it looks silly if the same large loop appears twice with only one or two lines changed.
What can I do? Use dynamically-valued constants maybe and hope the compiler is smart? How?
Many thanks!
EDIT by Luther Blissett. The OP wants to know if there is a better way to remove the pointer check here:
loop {
A;
if (ptr==0 || ptr->calculate()>5) B;
C;
}
than duplicating the loop as shown here:
if (ptr==0)
loop {
A;
B;
C;
}
else loop {
A;
if (ptr->calculate()>5) B;
C;
}
I just wanted to inform you, that apparently GCC can do this requested hoisting in the optimizer. Here's a model loop (in C):
struct C
{
int (*calculate)();
};
void sideeffect1();
void sideeffect2();
void sideeffect3();
void foo(struct C *ptr)
{
int i;
for (i=0;i<1000;i++)
{
sideeffect1();
if (ptr == 0 || ptr->calculate()>5) sideeffect2();
sideeffect3();
}
}
Compiling this with gcc 4.5 and -O3 gives:
.globl foo
.type foo, #function
foo:
.LFB0:
pushq %rbp
.LCFI0:
movq %rdi, %rbp
pushq %rbx
.LCFI1:
subq $8, %rsp
.LCFI2:
testq %rdi, %rdi # ptr==0? -> .L2, see below
je .L2
movl $1000, %ebx
.p2align 4,,10
.p2align 3
.L4:
xorl %eax, %eax
call sideeffect1 # sideeffect1
xorl %eax, %eax
call *0(%rbp) # call p->calculate, no check for ptr==0
cmpl $5, %eax
jle .L3
xorl %eax, %eax
call sideeffect2 # ok, call sideeffect2
.L3:
xorl %eax, %eax
call sideeffect3
subl $1, %ebx
jne .L4
addq $8, %rsp
.LCFI3:
xorl %eax, %eax
popq %rbx
.LCFI4:
popq %rbp
.LCFI5:
ret
.L2: # here's the loop with ptr==0
.LCFI6:
movl $1000, %ebx
.p2align 4,,10
.p2align 3
.L6:
xorl %eax, %eax
call sideeffect1 # does not try to call ptr->calculate() anymore
xorl %eax, %eax
call sideeffect2
xorl %eax, %eax
call sideeffect3
subl $1, %ebx
jne .L6
addq $8, %rsp
.LCFI7:
xorl %eax, %eax
popq %rbx
.LCFI8:
popq %rbp
.LCFI9:
ret
And so does clang 2.7 (-O3):
foo:
.Leh_func_begin1:
pushq %rbp
.Llabel1:
movq %rsp, %rbp
.Llabel2:
pushq %r14
pushq %rbx
.Llabel3:
testq %rdi, %rdi # ptr==NULL -> .LBB1_5
je .LBB1_5
movq %rdi, %rbx
movl $1000, %r14d
.align 16, 0x90
.LBB1_2:
xorb %al, %al # here's the loop with the ptr->calculate check()
callq sideeffect1
xorb %al, %al
callq *(%rbx)
cmpl $6, %eax
jl .LBB1_4
xorb %al, %al
callq sideeffect2
.LBB1_4:
xorb %al, %al
callq sideeffect3
decl %r14d
jne .LBB1_2
jmp .LBB1_7
.LBB1_5:
movl $1000, %r14d
.align 16, 0x90
.LBB1_6:
xorb %al, %al # and here's the loop for the ptr==NULL case
callq sideeffect1
xorb %al, %al
callq sideeffect2
xorb %al, %al
callq sideeffect3
decl %r14d
jne .LBB1_6
.LBB1_7:
popq %rbx
popq %r14
popq %rbp
ret
In C++, although completely overkill you can put the loop in a function and use a template. This will generate twice the body of the function, but eliminate the extra check which will be optimized out. While I certainly don't recommend it, here is the code:
template<bool ptr_is_null>
void loop() {
for(int i = x; i != y; ++i) {
/**/
if(ptr_is_null || ptr->calculate() > 5) {
/**/
}
/**/
}
}
You call it with:
if (ptr==NULL) loop<true>(); else loop<false>();
You are better off without this "optimization", the compiler will probably do the RightThing(TM) for you.
Why do you want to avoid comparing to NULL?
Creating a variant for each of the NULL and non-NULL cases just gives you almost twice as much code to write, test and more importantly maintain.
A 'large loop' smells like an opportunity to refactor the loop into separate functions, in order to make the code easier to maintain. Then you can easily have two variants of the loop, one for ptr == null and one for ptr != null, calling different functions, with just a rough similarity in the overall structure of the loop.
Since
ptr is an object pointer set before the loop and never changed
can't you just check if it is null before the loop and not check again... since you don't change it.
If it is not valid for your pointer to be NULL, you could use a reference instead.
If it is valid for your pointer to be NULL, but if so then you skip all processing, then you could either wrap your code with one check at the beginning, or return early from your function:
if (ptr != NULL)
{
// your function
}
or
if (ptr == NULL) { return; }
If it is valid for your pointer to be NULL, but only some processing is skipped, then keep it like it is.
if (ptr == NULL || ptr->calculate() > 5)
{do something}
I would simply think in terms of what is done if the condition is true.
If "do something" is really the exact same stuff for (ptr == NULL) or (ptr->calculate() > 5), then I hardly see a reason to split up anything.
If "do something" contains particular cases for either condition, then I would consider to refactor into separate loops to get rid of extra special case checking. Depends on the special cases involved.
Eliminating code duplication is good up to a point. You should not care too much about optimizing until your program does what it should do and until performance becomes a problem.
[...] Premature optimization is the root of all evil
http://en.wikipedia.org/wiki/Program_optimization
Related
I'm working on the classic "Reverse a String" problem.
Is a good idea to use the position of the null terminator for swap space? The idea is to save the declaration of one variable.
Specifically, starting with Kernighan and Ritchie's algorithm:
void reverse(char s[])
{
int length = strlen(s);
int c, i, j;
for (i = 0, j = length - 1; i < j; i++, j--)
{
c = s[i];
s[i] = s[j];
s[j] = c;
}
}
...can we instead do the following?
void reverseUsingNullPosition(char s[]) {
int length = strlen(s);
int i, j;
for (i = 0, j = length - 1; i < j; i++, j--) {
s[length] = s[i]; // Use last position instead of a new var
s[i] = s[j];
s[j] = s[length];
}
s[length] = 0; // Replace null character
}
Notice how the "c" variable is no longer needed. We simply use the last position in the array--where the null termination resides--as our swap space. When we're done, we simply replace the 0.
Here's the main routine (Xcode):
#include <stdio.h>
#include <string>
int main(int argc, const char * argv[]) {
char cheese[] = { 'c' , 'h' , 'e' , 'd' , 'd' , 'a' , 'r' , 0 };
printf("Cheese is: %s\n", cheese); //-> Cheese is: cheddar
reverse(cheese);
printf("Cheese is: %s\n", cheese); //-> Cheese is: raddehc
reverseUsingNullPosition(cheese);
printf("Cheese is: %s\n", cheese); //-> Cheese is: cheddar
}
Yes, this can be done. No, this is not a good idea, because it makes your program much harder to optimize.
When you declare char c in the local scope, the optimizer can figure out that the value is not used beyond the s[j] = c; assignment, and could place the temporary in a register. In addition to effectively eliminating the variable for you, the optimizer could even figure out that you are performing a swap, and emit a hardware-specific instruction. All this would save you a memory access per character.
When you use s[length] for your temporary, the optimizer does not have as much freedom. It is forced to emit the write into memory. This could be just as fast due to caching, but on embedded platforms this could have a significant effect.
First of all such microoptimizations are totally irrelevant until proven relevant. We're talking about C++, you have std::string, std::reverse, you shouldn't even think about such facts.
In any case if you compile both code with -Os on Xcode you obtain for reverse:
.cfi_startproc
Lfunc_begin0:
pushq %rbp
Ltmp3:
.cfi_def_cfa_offset 16
Ltmp4:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp5:
.cfi_def_cfa_register %rbp
pushq %r14
pushq %rbx
Ltmp6:
.cfi_offset %rbx, -32
Ltmp7:
.cfi_offset %r14, -24
movq %rdi, %r14
Ltmp8:
callq _strlen
Ltmp9:
leal -1(%rax), %ecx
testl %ecx, %ecx
jle LBB0_3
Ltmp10:
movslq %ecx, %rcx
addl $-2, %eax
Ltmp11:
xorl %edx, %edx
LBB0_2:
Ltmp12:
movb (%r14,%rdx), %sil
movb (%r14,%rcx), %bl
movb %bl, (%r14,%rdx)
movb %sil, (%r14,%rcx)
Ltmp13:
incq %rdx
decq %rcx
cmpl %eax, %edx
leal -1(%rax), %eax
jl LBB0_2
Ltmp14:
LBB0_3:
popq %rbx
popq %r14
popq %rbp
ret
Ltmp15:
Lfunc_end0:
.cfi_endproc
and for reverseUsingNullPosition:
.cfi_startproc
Lfunc_begin1:
pushq %rbp
Ltmp19:
.cfi_def_cfa_offset 16
Ltmp20:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp21:
.cfi_def_cfa_register %rbp
pushq %rbx
pushq %rax
Ltmp22:
.cfi_offset %rbx, -24
movq %rdi, %rbx
Ltmp23:
callq _strlen
Ltmp24:
leal -1(%rax), %edx
testl %edx, %edx
Ltmp25:
movslq %eax, %rdi
jle LBB1_3
Ltmp26:
movslq %edx, %rdx
addl $-2, %eax
Ltmp27:
xorl %esi, %esi
LBB1_2:
Ltmp28:
movb (%rbx,%rsi), %cl
movb %cl, (%rbx,%rdi)
movb (%rbx,%rdx), %cl
movb %cl, (%rbx,%rsi)
movb (%rbx,%rdi), %cl
movb %cl, (%rbx,%rdx)
Ltmp29:
incq %rsi
decq %rdx
cmpl %eax, %esi
leal -1(%rax), %eax
jl LBB1_2
Ltmp30:
LBB1_3: ## %._crit_edge
movb $0, (%rbx,%rdi)
addq $8, %rsp
popq %rbx
Ltmp31:
popq %rbp
ret
Ltmp32:
Lfunc_end1:
.cfi_endproc
If you check the inner loop you have
movb (%r14,%rdx), %sil
movb (%r14,%rcx), %bl
movb %bl, (%r14,%rdx)
movb %sil, (%r14,%rcx)
vs
movb (%rbx,%rsi), %cl
movb %cl, (%rbx,%rdi)
movb (%rbx,%rdx), %cl
movb %cl, (%rbx,%rsi)
movb (%rbx,%rdi), %cl
movb %cl, (%rbx,%rdx)
So I wouldn't say you are saving so much overhead as you think (since you are accessing the array more times), maybe yes, maybe no. Which teaches you another thing: thinking that some code is more performant than other code is irrelevant, the only thing that matters is a well-done benchmark and profile of the code.
Legal: Yes
Good idea: No
The cost of an "extra" variable is zero so there is absolutely no reason to avoid it. The stack pointer needs to be changed anyway so it doesn't matter if it needs to cope with an extra int.
Further:
With compiler optimization turned on, the variable c in the original code will most likely not even exists. It will just be a register in the cpu.
With your code: Optimization will be more difficult so it is not easy to say how well the compiler will do. Maybe you'll get the same - maybe you'll get something worse. But you won't get anything better.
So just forget the idea.
We can use printf and the STL and also manually unroll things and use pointers.
#include <stdio.h>
#include <string>
#include <cstring>
void reverse(char s[])
{
char * b=s;
char * e=s+::strlen(s)-4;
while (e - b > 4)
{
std::swap(b[0], e[3]);
std::swap(b[1], e[2]);
std::swap(b[2], e[1]);
std::swap(b[3], e[0]);
b+=4;
e-=4;
}
e+=3;
while (b < e)
{
std::swap(*(b++), *(e--));
}
}
int main(int argc, const char * argv[]) {
char cheese[] = { 'c' , 'h' , 'e' , 'd' , 'd' , 'a' , 'r' , 0 };
printf("Cheese is: %s\n", cheese); //-> Cheese is: cheddar
reverse(cheese);
printf("Cheese is: %s\n", cheese); //-> Cheese is: raddehc
}
Hard to tell if its faster with just the test case of "cheddar"
Which of the below code will be more optimized for efficiency First function or Second function in C/C++ gcc compiler ?
// First Function
if ( A && B && C ) {
UpdateData();
} else if ( A && B ){
ResetData();
}
//Second Function
if ( A && B) {
if (C) {
UpdateData();
} else {
ResetData();
}
}
Do we get any performance improvement in Second Function ?
If First Function is used, can the compiler optimize it to Second Method on its own ?
A large portion of this question will depend on what A, B and C really are (and the compiler will optimise it, as shown below). Simple types, definitely not worth worrying about. If they are some kind of "big number math" objects, or some complicated data type that needs 1000 instructions for each "is this true or not", then there will be a big difference if the compiler decides to make different code.
As always when it comes to performance: Measure in your own code, use profiling to detect where the code spends MOST of the time, and then measure with changes to that code. Repeat until it runs fast enough [whatever that is] and/or your manager tells you to stop fiddling with the code. Typically, however, unless it's REALLY a high traffic area of the code, it will make little difference to re-arrange the conditions in an if-statement, it is the overall algorithm that makes most impact in the general case.
If we assume A, B and C are simple types, such as int, we can write some code to investigate:
extern int A, B, C;
extern void UpdateData();
extern void ResetData();
void func1()
{
if ( A && B && C ) {
UpdateData();
} else if ( A && B ){
ResetData();
}
}
void func2()
{
if ( A && B) {
if (C) {
UpdateData();
} else {
ResetData();
}
}
}
gcc 4.8.2 given this, with -O1 produces this code:
_Z5func1v:
cmpl $0, A(%rip)
je .L6
cmpl $0, B(%rip)
je .L6
subq $8, %rsp
cmpl $0, C(%rip)
je .L3
call _Z10UpdateDatav
jmp .L1
.L3:
call _Z9ResetDatav
.L1:
addq $8, %rsp
.L6:
rep ret
_Z5func2v:
.LFB1:
cmpl $0, A(%rip)
je .L12
cmpl $0, B(%rip)
je .L12
subq $8, %rsp
cmpl $0, C(%rip)
je .L9
call _Z10UpdateDatav
jmp .L7
.L9:
call _Z9ResetDatav
.L7:
addq $8, %rsp
.L12:
rep ret
In other words: No difference at all
Using clang++ 3.7 (as of about 3 weeks ago) with -O1 gives this:
_Z5func1v: # #_Z5func1v
cmpl $0, A(%rip)
setne %cl
cmpl $0, B(%rip)
setne %al
andb %cl, %al
movzbl %al, %ecx
cmpl $1, %ecx
jne .LBB0_2
movl C(%rip), %ecx
testl %ecx, %ecx
je .LBB0_2
jmp _Z10UpdateDatav # TAILCALL
.LBB0_2: # %if.else
testb %al, %al
je .LBB0_3
jmp _Z9ResetDatav # TAILCALL
.LBB0_3: # %if.end8
retq
_Z5func2v: # #_Z5func2v
cmpl $0, A(%rip)
je .LBB1_4
movl B(%rip), %eax
testl %eax, %eax
je .LBB1_4
cmpl $0, C(%rip)
je .LBB1_3
jmp _Z10UpdateDatav # TAILCALL
.LBB1_4: # %if.end4
retq
.LBB1_3: # %if.else
jmp _Z9ResetDatav # TAILCALL
.Ltmp1:
The chaining of and in the func1 of clang MAY be of benefit, but it's probably such a small difference that you should concentrate on what makes more sense from a logical perspective of the code.
In summary: Not worth it
Higher optimisation in g++ makes it do the same tailcall optimisation that clang does, otherwise no difference.
However, if we make A, B and C into external functions, which the compiler can't "understand", then we get a difference:
_Z5func1v: # #_Z5func1v
pushq %rax
.Ltmp0:
.cfi_def_cfa_offset 16
callq _Z1Av
testl %eax, %eax
je .LBB0_3
callq _Z1Bv
testl %eax, %eax
je .LBB0_3
callq _Z1Cv
testl %eax, %eax
je .LBB0_3
popq %rax
jmp _Z10UpdateDatav # TAILCALL
.LBB0_3: # %if.else
callq _Z1Av
testl %eax, %eax
je .LBB0_5
callq _Z1Bv
testl %eax, %eax
je .LBB0_5
popq %rax
jmp _Z9ResetDatav # TAILCALL
.LBB0_5: # %if.end12
popq %rax
retq
_Z5func2v: # #_Z5func2v
pushq %rax
.Ltmp2:
.cfi_def_cfa_offset 16
callq _Z1Av
testl %eax, %eax
je .LBB1_4
callq _Z1Bv
testl %eax, %eax
je .LBB1_4
callq _Z1Cv
testl %eax, %eax
je .LBB1_3
popq %rax
jmp _Z10UpdateDatav # TAILCALL
.LBB1_4: # %if.end6
popq %rax
retq
.LBB1_3: # %if.else
popq %rax
jmp _Z9ResetDatav # TAILCALL
Here we DO see the difference between func1 and func2, where func1 will call A and B twice - since the compiler can't assume that calling those functions ONCE will do the same thing as calling twice. [Consider that the functions A and B may be reading data from a file, calling rand, or whatever, the result of NOT calling that function may be that the program behaves differently.
(In this case I only posted clang code, but g++ produces code that has the same outcome, but slightly different ordering of the different lumps of code)
I have a simple piece of code, that addresses this (poorly stated, out of place) question :
template<typename It>
bool isAlpha(It first, It last)
{
return (first != last && *first != '\0') ?
(isalpha(static_cast<int>(*first)) && isAlpha(++first, last)) : true;
}
I'm trying to figure out how can I go about implementing it in a tail recursive fashion, and although there are great sources like this answer, I can't wrap my mind around it.
Can anyone help ?
EDIT
I'm placing the disassembly code below; The compiler is gcc 4.9.0 compiling with -std=c++11 -O2 -Wall -pedantic the assembly output is
bool isAlpha<char const*>(char const*, char const*):
cmpq %rdi, %rsi
je .L5
movzbl (%rdi), %edx
movl $1, %eax
testb %dl, %dl
je .L12
pushq %rbp
pushq %rbx
leaq 1(%rdi), %rbx
movq %rsi, %rbp
subq $8, %rsp
.L3:
movsbl %dl, %edi
call isalpha
testl %eax, %eax
jne .L14
xorl %eax, %eax
.L2:
addq $8, %rsp
popq %rbx
popq %rbp
.L12:
rep ret
.L14:
cmpq %rbp, %rbx
je .L7
addq $1, %rbx
movzbl -1(%rbx), %edx
testb %dl, %dl
jne .L3
.L7:
movl $1, %eax
jmp .L2
.L5:
movl $1, %eax
ret
To clarify cdhowie's point, the function can be rewritten as follows (unless I made a mistake):
bool isAlpha(It first, It last)
{
if (first == last)
return true;
if (*first == '\0')
return true;
if (!isalpha(static_cast<int>(*first))
return false;
return isAlpha(++first, last);
}
Which would indeed allow for trivial tail call elimination.
This is normally a job for the compiler, though.
So the question is which of these implementation has better performance and readability.
Imagine you have to write a code that each step is dependent of the success of the previous one, something like:
bool function()
{
bool isOk = false;
if( A.Func1() )
{
B.Func1();
if( C.Func2() )
{
if( D.Func3() )
{
...
isOk = true;
}
}
}
return isOk;
}
Let's say there are up to 6 nested IFs, since I don't want the padding to grow too much to the right, and I don't want to nest the function calls because there are several parameters involved, the first approach would be using the inverse logic:
bool function()
{
if( ! A.Func1() ) return false:
B.Func1();
if( ! C.Func2() ) return false;
if( ! D.Func3() ) return false;
...
return true;
}
But what about avoiding so many returns, like this:
bool function()
{
bool isOk = false;
do
{
if( ! A.Func1() ) break:
B.Func1();
if( ! C.Func2() ) break;
if( ! D.Func3() ) break;
...
isOk = true;
break;
}while(false);
return isOk;
}
Compilers will break down your code to simple instructions, using branch instructions to form loops, if/else etc, and it's unlikely your code will be any different at all once the compiler has gone over it.
Write the code that you think makes most sense for the solution you require.
If I were to "vote" for one of the three variants, I'd say my code is mostly variant 2. However, I don't follow it religiously. If it makes more sense (from a "how you think about it" perspective) to write in variant 1, then I will do that.
I don't think I've ever written, or even seen code written like variant 3 - I'm sure it happens, but if your goal is to have a single return, then I'd say variant 1 is the clearer and more obvious choice. Variant 3 is really just a "goto by another name" (see my most rewarded answer [and that's after I had something like 80 down-votes for suggesting goto as a solution]). I personally don't see variant 3 as any better than the other two, and unless the function is short enough to see do and the while on the same page, you also don't actually know that it won't loop without scrolling around - which is really not a good thing.
If you then, after profiling the code, discover a particular function is taking more time than you think is "right", study the assembly code.
Just to illustrate this, I will take your code and compile all three examples with g++ and clang++, and show the resulting code. It will probably take a few minutes because I have to actually make it compileable first.
Your source, after some massaging to make it compile as a singe source file:
class X
{
public:
bool Func1();
bool Func2();
bool Func3();
};
X A, B, C, D;
bool function()
{
bool isOk = false;
if( A.Func1() )
{
B.Func1();
if( C.Func2() )
{
if( D.Func3() )
{
isOk = true;
}
}
}
return isOk;
}
bool function2()
{
if( ! A.Func1() ) return false;
B.Func1();
if( ! C.Func2() ) return false;
if( ! D.Func3() ) return false;
return true;
}
bool function3()
{
bool isOk = false;
do
{
if( ! A.Func1() ) break;
B.Func1();
if( ! C.Func2() ) break;
if( ! D.Func3() ) break;
isOk = true;
}while(false);
return isOk;
}
Code generated by clang 3.5 (compiled from sources a few days ago):
_Z8functionv: # #_Z8functionv
pushq %rax
movl $A, %edi
callq _ZN1X5Func1Ev
testb %al, %al
je .LBB0_2
movl $B, %edi
callq _ZN1X5Func1Ev
movl $C, %edi
callq _ZN1X5Func2Ev
testb %al, %al
je .LBB0_2
movl $D, %edi
popq %rax
jmp _ZN1X5Func3Ev # TAILCALL
xorl %eax, %eax
popq %rdx
retq
_Z9function2v: # #_Z9function2v
pushq %rax
movl $A, %edi
callq _ZN1X5Func1Ev
testb %al, %al
je .LBB1_1
movl $B, %edi
callq _ZN1X5Func1Ev
movl $C, %edi
callq _ZN1X5Func2Ev
testb %al, %al
je .LBB1_3
movl $D, %edi
callq _ZN1X5Func3Ev
# kill: AL<def> AL<kill> EAX<def>
jmp .LBB1_5
.LBB1_1:
xorl %eax, %eax
jmp .LBB1_5
.LBB1_3:
xorl %eax, %eax
.LBB1_5:
# kill: AL<def> AL<kill> EAX<kill>
popq %rdx
retq
_Z9function3v: # #_Z9function3v
pushq %rax
.Ltmp4:
.cfi_def_cfa_offset 16
movl $A, %edi
callq _ZN1X5Func1Ev
testb %al, %al
je .LBB2_2
movl $B, %edi
callq _ZN1X5Func1Ev
movl $C, %edi
callq _ZN1X5Func2Ev
testb %al, %al
je .LBB2_2
movl $D, %edi
popq %rax
jmp _ZN1X5Func3Ev # TAILCALL
.LBB2_2:
xorl %eax, %eax
popq %rdx
retq
In the clang++ code, the second function is very marginally worse due to an extra jump that one would have hoped the compiler could sort out being the same as one of the others. But I doubt any realistic code where func1 and func2 and func3 actually does anything meaningful will show any measurable difference.
And g++ 4.8.2:
_Z8functionv:
subq $8, %rsp
movl $A, %edi
call _ZN1X5Func1Ev
testb %al, %al
jne .L10
.L3:
xorl %eax, %eax
addq $8, %rsp
ret
.L10:
movl $B, %edi
call _ZN1X5Func1Ev
movl $C, %edi
call _ZN1X5Func2Ev
testb %al, %al
je .L3
movl $D, %edi
addq $8, %rsp
jmp _ZN1X5Func3Ev
_Z9function2v:
subq $8, %rsp
movl $A, %edi
call _ZN1X5Func1Ev
testb %al, %al
jne .L19
.L13:
xorl %eax, %eax
addq $8, %rsp
ret
.L19:
movl $B, %edi
call _ZN1X5Func1Ev
movl $C, %edi
call _ZN1X5Func2Ev
testb %al, %al
je .L13
movl $D, %edi
addq $8, %rsp
jmp _ZN1X5Func3Ev
_Z9function3v:
.LFB2:
subq $8, %rsp
movl $A, %edi
call _ZN1X5Func1Ev
testb %al, %al
jne .L28
.L22:
xorl %eax, %eax
addq $8, %rsp
ret
.L28:
movl $B, %edi
call _ZN1X5Func1Ev
movl $C, %edi
call _ZN1X5Func2Ev
testb %al, %al
je .L22
movl $D, %edi
addq $8, %rsp
jmp _ZN1X5Func3Ev
I challenge you to spot the difference aside from the label names between the different functions.
I think performance (and most likely even binary code) will be the same with any modern compiler.
Readability is somewhat a matter of conventions and habits.
I personally would prefer the first form, and probably you would need a new function to group some of the conditions (I think some of them can be grouped together in some meaningful way). The third form looks most cryptic to me.
As C++ has RAII and automatic cleanups, I tend to prefer the bail-out-with-return-as-soon-as-possible solution (your second one), because the code gets much cleaner IMHO. Obviously, it's a matter of opinion, taste and YMMV...
Should one use dynamic memory allocation when one knows that a variable will not be needed before it goes out of scope?
For example in the following function:
void func(){
int i =56;
//do something with i, i is not needed past this point
for(int t; t<1000000; t++){
//code
}
}
say one only needed i for a small section of the function, is it worthwhile deleting i as it is not needed in the very long for loop?
As Borgleader said:
A) This is micro (and most probably premature) optimization, meaning
don't worry about it. B) In this particular case, dynamically
allocation i might even hurt performance. tl;dr; profile first,
optimize later
As an example, I compiled the following two programs into assembly (using g++ -S flag with no optimisation enabled).
Creating i on the stack:
int main(void)
{
int i = 56;
i += 5;
for(int t = 0; t<1000; t++) {}
return 0;
}
Dynamically:
int main(void)
{
int* i = new int(56);
*i += 5;
delete i;
for(int t = 0; t<1000; t++) {}
return 0;
}
The first program compiled to:
movl $56, -8(%rbp) # Store 56 on stack (int i = 56)
addl $5, -8(%rbp) # Add 5 to i (i += 5)
movl $0, -4(%rbp) # Initialize loop index (int t = 0)
jmp .L2 # Begin loop (goto .L2.)
.L3:
addl $1, -4(%rbp) # Increment index (t++)
.L2:
cmpl $999, -4(%rbp) # Check loop condition (t<1000)
setle %al
testb %al, %al
jne .L3 # If (t<1000) goto .L3.
movl $0, %eax # return 0
And the second:
subq $16, %rsp # Allocate memory (new)
movl $4, %edi
call _Znwm
movl $56, (%rax) # Store 56 in *i
movq %rax, -16(%rbp)
movq -16(%rbp), %rax # Add 5
movl (%rax), %eax
leal 5(%rax), %edx
movq -16(%rbp), %rax
movl %edx, (%rax)
movq -16(%rbp), %rax # Free memory (delete)
movq %rax, %rdi
call _ZdlPv
movl $0, -4(%rbp) # Initialize loop index (int t = 0)
jmp .L2 # Begin loop (goto .L2.)
.L3:
addl $1, -4(%rbp) # Increment index (t++)
.L2:
cmpl $999, -4(%rbp) # Check loop condition (t<1000)
setle %al
testb %al, %al
jne .L3 # If (t<1000) goto .L3.
movl $0, %eax # return 0
In the above assembly output, you can see strait away that there is a significant difference between the number of commands being executed. If I compile the same programs with optimisation turned on. The first program produced the result:
xorl %eax, %eax # Equivalent to return 0;
The second produced:
movl $4, %edi
call _Znwm
movl $61, (%rax) # A smart compiler knows 56+5 = 61
movq %rax, %rdi
call _ZdlPv
xorl %eax, %eax
addq $8, %rsp
With optimisation on, the compiler becomes a pretty powerful tool for improving your code, in certain cases it can even detect that a program only returns 0 and get rid of all the unnecessary code. When you use dynamic memory in the code above, the program still has to request and then free the dynamic memory, it can't optimise it out.