Consider the following:
inline unsigned int f1(const unsigned int i, const bool b) {return b ? i : 0;}
inline unsigned int f2(const unsigned int i, const bool b) {return b*i;}
The syntax of f2 is more compact, but do the standard guarantees that f1 and f2 are strictly equivalent ?
Furthermore, if I want the compiler to optimize this expression if b and i are known at compile-time, which version should I prefer ?
Well, yes, both are equivalent. bool is an integral type and true is guaranteed to convert to 1 in integer context, while false is guaranteed to convert to 0.
(The reverse is also true, i.e. non-zero integer values are guaranteed to convert to true in boolean context, while zero integer values are guaranteed to convert to false in boolean context.)
Since you are working with unsigned types, one can easily come up with other, possibly bit-hack-based yet perfectly portable implementations of the same thing, like
i & -(unsigned) b
although a decent compiler should be able to choose the best implementation by itself for any of your versions.
P.S. Although to my great surprise, GCC 4.1.2 compiled all three variants virtually literally, i.e. it used machine multiplication instruction in multiplication-based variant. It was smart enough to use cmovne instruction on the ?: variant to make it branchless, which quite possibly made it the most efficient implementation.
Yes. It's safe to assume true is 1 and false is 0 when used in expressions as you do and is guaranteed:
C++11, Integral Promotions, 4.5:
An rvalue of type bool can be converted to an rvalue of type int, with
false becoming zero and true becoming one.
The compiler will use implicit conversion to make an unsigned int from b, so, yes, this should work. You're skipping the condition checking by simple multiplication. Which one is more effective/faster? Don't know. A good compiler would most likely optimize both versions I'd assume.
FWIW, the following code
inline unsigned int f1(const unsigned int i, const bool b) {return b ? i : 0;}
inline unsigned int f2(const unsigned int i, const bool b) {return b*i;}
int main()
{
volatile unsigned int i = f1(42, true);
volatile unsigned int j = f2(42, true);
}
compiled with gcc -O2 produces this assembly:
.file "test.cpp"
.def ___main; .scl 2; .type 32; .endef
.section .text.startup,"x"
.p2align 2,,3
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB2:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
call ___main
movl $42, 8(%esp) // i
movl $42, 12(%esp) // j
xorl %eax, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE2:
There's not much left of either f1 or f2, as you can see.
As far as C++ standard is concerned, the compiler is allowed to do anything with regards to optimization, as long as it doesn't change the observable behaviour (the as if rule).
Related
When it comes to the C and C++ languages, does the compiler optimize references to constant variables so that the program automatically knows what values are being referred to, instead of having to peek at the memory locations of the constant variables? When it comes to arrays, does it depend on whether the index value to point at in the array is a constant at compile time?
For instance, take a look at this code:
int main(void) {
1: char tesst[3] = {'1', '3', '7'};
2: char erm = tesst[1];
}
Does the compiler "change" line 2 to "char erm = '3'" at compile time?
I personally would expect the posted code to turn into "nothing", since neither variable is actually used, and thus can be removed.
But yes, modern compilers (gcc, clang, msvc, etc) should be able to replace that reference to the alternative with it's constant value [as long as the compiler can be reasonably sure that the content of tesst isn't being changed - if you pass tesst into a function, even if its as a const reference, and the compiler doesn't actually know the function is NOT changing that, it will assume that it does and load the value].
Compiling this using clang -O1 opts.c -S:
#include <stdio.h>
int main()
{
char tesst[3] = {'1', '3', '7'};
char erm = tesst[1];
printf("%d\n", erm);
}
produces:
...
main:
pushq %rax
.Ltmp0:
movl $.L.str, %edi
movl $51, %esi
xorl %eax, %eax
callq printf
xorl %eax, %eax
popq %rcx
retq
...
So, the same as printf("%d\n", '3');.
[I'm using C rather than C++ because it would be about 50 lines of assembler if I used cout, as everything gets inlined]
I expect gcc and msvc to make a similar optimisation (tested gcc -O1 -S and it gives exactly the same code, aside from some symbol names are subtly different)
And to illustrate that "it may not do it if you call a function":
#include <stdio.h>
extern void blah(const char* x);
int main()
{
char tesst[3] = {'1', '3', '7'};
blah(tesst);
char erm = tesst[1];
printf("%d\n", erm);
}
main: # #main
pushq %rax
movb $55, 6(%rsp)
movw $13105, 4(%rsp) # imm = 0x3331
leaq 4(%rsp), %rdi
callq blah
movsbl 5(%rsp), %esi
movl $.L.str, %edi
xorl %eax, %eax
callq printf
xorl %eax, %eax
popq %rcx
retq
Now, it fetches the value from inside tesst.
It mostly depends on the level of optimization and which compiler you are using.
With maximum optimizations, the compiler will indeed probably just replace your whole code with char erm = '3';. GCC -O3 does this anyway.
But then of course it depends on what you do with that variable. The compiler might not even allocate the variable, but just use the raw number in the operation where the variable occurs.
Depends on the compiler version, optimization options used and many other things. If you want to make sure that the const variables are optimized and if they are compile time constants you can use something like constexpr in c++. It is guaranteed to be evaluated at compile time unlike normal const variables.
Edit: constexpr may be evaluated at compile time or runtime. To guarantee compile-time evaluation, we must either use it where a constant expression is required (e.g., as an array bound or as a case label) or use it to initialize a constexpr. so in this case
constexpr char tesst[3] = {'1','3','7'};
constexpr char erm = tesst[1];
would lead to compile time evaluation. Nice read at https://isocpp.org/blog/2013/01/when-does-a-constexpr-function-get-evaluated-at-compile-time-stackoverflow
Does a conversion like:
int a[3];
char i=1;
a[ static_cast<unsigned char>(i) ];
introduce any overhead like conversions or can the compiler optimize everything away?
I am interested because I want to get rid of -Wchar-subscripts warnings, but want to use a char as index (other reasons)
I did one test on Clang 3.4.1 for this code :
int ival(signed char c) {
int a[] = {0,1,2,3,4,5,6,7,8,9};
unsigned char u = static_cast<unsigned char>(c);
return a[u];
}
Here is the relevant part or the assembly file generated with c++ -S -O3
_Z4ivala: # #_Z4ivala
# BB#0:
pushl %ebp
movl %esp, %ebp
movzbl 8(%ebp), %eax
movl .L_ZZ4ivalaE1a(,%eax,4), %eax
popl %ebp
ret
There is no trace of the conversion.
On most modern architectures char and unsigned char have the same size and alignment, hence unsigned char can represent all non-negative values of char and casting one to another does not require any CPU instructions.
Suppose you have a class that has private members which are accessed a lot in a program (such as in a loop which has to be fast). Imagine I have defined something like this:
class Foo
{
public:
Foo(unsigned set)
: vari(set)
{}
const unsigned& read_vari() const { return vari; }
private:
unsigned vari;
};
The reason I would like to do it this way is because, once the class is created, "vari" shouldn't be changed anymore. Thus, to minimize bug occurrence, "it seemed like a good idea at the time".
However, if I now need to call this function millions of times, I was wondering if there is an overhead and a slowdown instead of simply using:
struct Foo
{
unsigned vari;
};
So, was my first impule right in using a class, to avoid anyone mistakenly changing the value of the variable after it has been set by the constructor?
Also, does this introduce a "penalty" in the form of a function call overhead. (Assuming I use optimization flags in the compiler, such as -O2 in GCC)?
They should come out to be the same. Remember that frustrating time you tried to use the operator[] on a vector and gdb just replied optimized out? This is what will happen here. The compiler will not create a function call here but it will rather access the variable directly.
Let's have a look at the following code
struct foo{
int x;
int& get_x(){
return x;
}
};
int direct(foo& f){
return f.x;
}
int fnc(foo& f){
return f.get_x();
}
Which was compiled with g++ test.cpp -o test.s -S -O2. The -S flag tells the compiler to "Stop after the stage of compilation proper; do not assemble (quote from the g++ manpage)." This is what the compiler gives us:
_Z6directR3foo:
.LFB1026:
.cfi_startproc
movl (%rdi), %eax
ret
and
_Z3fncR3foo:
.LFB1027:
.cfi_startproc
movl (%rdi), %eax
ret
as you can see, no function call was made in the second case and they are both the same. Meaning there is no performance overhead in using the accessor method.
bonus: what happens if optimizations are turned off? same code, here are the results:
_Z6directR3foo:
.LFB1022:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movq %rdi, -8(%rbp)
movq -8(%rbp), %rax
movl (%rax), %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
and
_Z3fncR3foo:
.LFB1023:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movq %rdi, -8(%rbp)
movq -8(%rbp), %rax
movq %rax, %rdi
call _ZN3foo5get_xEv #<<<call to foo.get_x()
movl (%rax), %eax
leave
.cfi_def_cfa 7, 8
ret
As you can see without optimizations, the sturct is faster than the accessor, but who ships code without optimizations?
You can expect identical performance. A great many C++ classes rely on this - for example, C++11's list::size() const can be expected to trivially return a data member. (Which contrasts with vector(), where the implementation's I've looked at calculate size() as the difference between pointer data member's corresponding to begin() and end(), ensuring typical iterator usage is as fast as possible at the cost of potentially slower indexed iteration, if the optimiser can't determine that size() is constant across loop iterations).
There's typically no particular reason to return by const reference for a type like unsigned that should fit in a CPU register anyway, but as it's inlined the compiler doesn't have to take that literally (for an out-of-line version it would likely be implemented by returning a pointer that has to be dereferenced). (The atypical reason is to allow taking the address of the variable, which is why say vector::operator[](size_t) const needs to return a const T& rather than a T, even if T is small enough to fit in a register.)
There is only one way to tell with certainty which one is faster in your particular program built with your particular tools with your particular optimisation flags on your particular platform — by measuring both variants.
Having said that, chances are good that the binaries will be identical, instruction for instruction.
As others have said, optimizers these days are relied on to boil out abstraction (especially in C++, which is more or less built to take advantage of that) and they're very, very good.
But you might not need the getter for this.
struct Foo {
Foo(unsigned set) : vari(set) {}
unsigned const vari;
};
const doesn't forbid initialization.
When I initialize float variables in my program, I commonly have vectors like:
Vector forward(0.f,0.f,-1.f),right(1.f,0.f,0.f),up(0.f,1.f,0.f)
(Vectors are just 3 floats like struct Vector{ float x,y,z; };)
This looks much easier to read as:
Vector forward(0,0,-1),right(1,0,0),up(0,1,0)
Must I initialize my float variables using floats? Am I losing anything or incurring some kind of penalty when I use integers (or doubles) to initialize a float?
There's no semantic difference between the two. Depending on some compilers, it is possible for extra code to be generated, though. See also this and this SO questions of the same topic.
I can confirm that gcc generates the same code for all variants of
int main()
{
float a = 0.0f; /* or 0 or 0.0 */
return 0;
}
and that this code is
.file "1.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $0x00000000, %eax
movl %eax, -4(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",#progbits
The relevant line is
movl $0x00000000, %eax
Changing a to 0.1 (or 0.1f) changes the line to
movl $0x3dcccccd, %eax
It seems that gcc is able to deduce the correct constant and doesn't generate extra code.
For a single literal constant, it shouldn't matter. In the context of an initializer, a constant of any numeric type will be implicitly converted to the type of the object being initialized. This is guaranteed by the language standard. So all of these:
float x = 0;
float x = 0.0;
float x = 0.0f;
float x = 0.0L; // converted from long double to float
are equally valid and result in the same value being stored in x.
A literal constant in a more complex expression can have surprising results, though.
In most cases, each expression is evaluated by itself, regardless of the context in which it appears. Any implicit conversion is applied after the subexpression has been evaluated.
So if you write:
float x = 1 / 2;
the expression 1 / 2 will be evaluated as an int, yielding 0, which is then converted to float. It will setxto0.0f, not to0.5f`.
I think you should be safe using unsuffixed floating-point constants (which are of type double).
Incidentally, you might consider using double rather than float in your program. double, as I mentioned, is the type of an unsuffixed floating-point constant, and can be thought of in some sense as the "default" floating-point type. It usually has more range and precision than float, and there's typically not much difference in performance.
It could be a good programming practise to always write 0.f, 1.f etc., even if often gcc can figure out what the programmer means by 1.0 et al.
The problematic cases are not so much trivial float variable initializations, but numeric constants in somewhat more complex formulae, where a combination of operators, float variables and said constants can easily lead to occurrence of unintended double valued calculations and costly float-double-float conversions.
Spotting these conversions without specifically checking the compiled code for them becomes very hard if the intended type for numeric values is mostly omitted in the code and instead only included when it's absolutely required. Hence I for one would choose the approach of just typing in the f's and getting used to having them around.
I have written two short tests and compiled both with "g++ -S" (gcc version 4.7 on Arch Linux):
test1.cpp
inline int func(int a, int b) {
return a+b;
}
int main() {
int c = func(5,5);
return 0;
}
test2.cpp
inline int func(const int& a, const int& b) {
return a+b;
}
int main() {
int c = func(5,5);
return 0;
}
diff test1.s test2.s
1,5c1,5
< .file "test1.cpp"
< .section .text._Z4funcii,"axG",#progbits,_Z4funcii,comdat
< .weak _Z4funcii
< .type _Z4funcii, #function
< _Z4funcii:
---
> .file "test2.cpp"
> .section .text._Z4funcRKiS0_,"axG",#progbits,_Z4funcRKiS0_,comdat
> .weak _Z4funcRKiS0_
> .type _Z4funcRKiS0_, #function
> _Z4funcRKiS0_:
12a13,14
> movl 8(%ebp), %eax
> movl (%eax), %edx
14c16
< movl 8(%ebp), %edx
---
> movl (%eax), %eax
22c24
< .size _Z4funcii, .-_Z4funcii
---
> .size _Z4funcRKiS0_, .-_Z4funcRKiS0_
36,38c38,44
< movl $5, 4(%esp)
< movl $5, (%esp)
< call _Z4funcii
---
> movl $5, 20(%esp)
> movl $5, 24(%esp)
> leal 20(%esp), %eax
> movl %eax, 4(%esp)
> leal 24(%esp), %eax
> movl %eax, (%esp)
> call _Z4funcRKiS0_
However, I don't really know how to interpret the results. All I see is that test2 apparently generates longer code, but I can't really tell what the differences are.
A follow-up question: Does it matter with member functions?
const correctness is for humans, not for code generation
it makes it easier for humans to understand code, by introducing constraints on what can change
so, yes it matters for short functions also, since they can be called from code that is not as short and easy to comprehend in full at a glance
that said, your example functions do not illustrate const correctness
so, while this answers the literal question, probably you have misunderstood what const correctness is about, and meant to ask some other question, which i will refrain from guessing at
Those functions aren't equivalent, one takes argument by value and the other by reference. Note that if you drop the references & then you are left with exactly the same functions, except that one enforces that you don't change the value of the copied arguments while the other doesn't.