Class member that only calls member of other class? - c++

given:
#include <stdio.h>
class A
{
friend class B;
private:
void func();
} GlobalA;
void A::func()
{
printf("A::func()");
}
class B
{
public:
void func();
};
void B::func()
{
GlobalA.func();
}
int main()
{
B b;
b.func();
getchar();
}
so really all B::func() does is call A::func(), is there a better way to do this? Or does the compiler just call A::func() directly when it compiles.
CONSTRAINTS:
class A creates threads and is used by multiple other classes. it is a global IO class to manage sockets/pipes, so i dont believe any type of inheritance would go over well.
NOTE: If this is a google-able problem please let me know as i did not what to search.

In fact B.func() does something more subtle:
It does not call A::func but GlobalA.func() , GlobalA is an instance of class A.
So here GlobalA is a singleton (but expressed in a very "raw" way of single global instance)
So wathever number of B instances you would create, they'll always call the same A instance (GlobalA).

Check out the generated assembly from your compiler (I used GCC 4.7 -O3):
For A::func()
_ZN1A4funcEv:
.LFB31:
.cfi_startproc
subl $28, %esp
.cfi_def_cfa_offset 32
movl $.LC0, 4(%esp)
movl $1, (%esp)
call __printf_chk
addl $28, %esp
.cfi_def_cfa_offset 4
ret
.cfi_endproc
And B::func():
_ZN1B4funcEv:
.LFB32:
.cfi_startproc
subl $28, %esp
.cfi_def_cfa_offset 32
movl $.LC0, 4(%esp)
movl $1, (%esp)
call __printf_chk
addl $28, %esp
.cfi_def_cfa_offset 4
ret
.cfi_endproc
They're identical - the compiler has done the smart thing for you for free behind the scenes. In this case your example was all in the same translation unit which makes it trivial for the compiler to decide if it's worth doing like that. (There are cases where it wouldn't be worth it most likely and the compiler will have a pretty good set of heuristics to help it figure out what's best for any given target).
If they were in different translation units it becomes quite a lot harder to do. Some compilers will still manage the same optimisations but not all. You can of course ensure that it remains within the same translation unit for every case by defining functions like these as inline which lets you specify them in the header files.
The moral of the story is don't sweat over tiny details - writing code that makes sense and is maintainable is far more important.

This is a common pattern: namely a bridge.
From my experience it is always inlined (g++ since version 4.3 at least).
See Flexo's answer, the call to A member function is indeed inlined in the sample code you posted.

Related

Does the compiler optimize references to constant variables?

When it comes to the C and C++ languages, does the compiler optimize references to constant variables so that the program automatically knows what values are being referred to, instead of having to peek at the memory locations of the constant variables? When it comes to arrays, does it depend on whether the index value to point at in the array is a constant at compile time?
For instance, take a look at this code:
int main(void) {
1: char tesst[3] = {'1', '3', '7'};
2: char erm = tesst[1];
}
Does the compiler "change" line 2 to "char erm = '3'" at compile time?
I personally would expect the posted code to turn into "nothing", since neither variable is actually used, and thus can be removed.
But yes, modern compilers (gcc, clang, msvc, etc) should be able to replace that reference to the alternative with it's constant value [as long as the compiler can be reasonably sure that the content of tesst isn't being changed - if you pass tesst into a function, even if its as a const reference, and the compiler doesn't actually know the function is NOT changing that, it will assume that it does and load the value].
Compiling this using clang -O1 opts.c -S:
#include <stdio.h>
int main()
{
char tesst[3] = {'1', '3', '7'};
char erm = tesst[1];
printf("%d\n", erm);
}
produces:
...
main:
pushq %rax
.Ltmp0:
movl $.L.str, %edi
movl $51, %esi
xorl %eax, %eax
callq printf
xorl %eax, %eax
popq %rcx
retq
...
So, the same as printf("%d\n", '3');.
[I'm using C rather than C++ because it would be about 50 lines of assembler if I used cout, as everything gets inlined]
I expect gcc and msvc to make a similar optimisation (tested gcc -O1 -S and it gives exactly the same code, aside from some symbol names are subtly different)
And to illustrate that "it may not do it if you call a function":
#include <stdio.h>
extern void blah(const char* x);
int main()
{
char tesst[3] = {'1', '3', '7'};
blah(tesst);
char erm = tesst[1];
printf("%d\n", erm);
}
main: # #main
pushq %rax
movb $55, 6(%rsp)
movw $13105, 4(%rsp) # imm = 0x3331
leaq 4(%rsp), %rdi
callq blah
movsbl 5(%rsp), %esi
movl $.L.str, %edi
xorl %eax, %eax
callq printf
xorl %eax, %eax
popq %rcx
retq
Now, it fetches the value from inside tesst.
It mostly depends on the level of optimization and which compiler you are using.
With maximum optimizations, the compiler will indeed probably just replace your whole code with char erm = '3';. GCC -O3 does this anyway.
But then of course it depends on what you do with that variable. The compiler might not even allocate the variable, but just use the raw number in the operation where the variable occurs.
Depends on the compiler version, optimization options used and many other things. If you want to make sure that the const variables are optimized and if they are compile time constants you can use something like constexpr in c++. It is guaranteed to be evaluated at compile time unlike normal const variables.
Edit: constexpr may be evaluated at compile time or runtime. To guarantee compile-time evaluation, we must either use it where a constant expression is required (e.g., as an array bound or as a case label) or use it to initialize a constexpr. so in this case
constexpr char tesst[3] = {'1','3','7'};
constexpr char erm = tesst[1];
would lead to compile time evaluation. Nice read at https://isocpp.org/blog/2013/01/when-does-a-constexpr-function-get-evaluated-at-compile-time-stackoverflow

Is returning a private class member slower than using a struct and accessing that variable directly?

Suppose you have a class that has private members which are accessed a lot in a program (such as in a loop which has to be fast). Imagine I have defined something like this:
class Foo
{
public:
Foo(unsigned set)
: vari(set)
{}
const unsigned& read_vari() const { return vari; }
private:
unsigned vari;
};
The reason I would like to do it this way is because, once the class is created, "vari" shouldn't be changed anymore. Thus, to minimize bug occurrence, "it seemed like a good idea at the time".
However, if I now need to call this function millions of times, I was wondering if there is an overhead and a slowdown instead of simply using:
struct Foo
{
unsigned vari;
};
So, was my first impule right in using a class, to avoid anyone mistakenly changing the value of the variable after it has been set by the constructor?
Also, does this introduce a "penalty" in the form of a function call overhead. (Assuming I use optimization flags in the compiler, such as -O2 in GCC)?
They should come out to be the same. Remember that frustrating time you tried to use the operator[] on a vector and gdb just replied optimized out? This is what will happen here. The compiler will not create a function call here but it will rather access the variable directly.
Let's have a look at the following code
struct foo{
int x;
int& get_x(){
return x;
}
};
int direct(foo& f){
return f.x;
}
int fnc(foo& f){
return f.get_x();
}
Which was compiled with g++ test.cpp -o test.s -S -O2. The -S flag tells the compiler to "Stop after the stage of compilation proper; do not assemble (quote from the g++ manpage)." This is what the compiler gives us:
_Z6directR3foo:
.LFB1026:
.cfi_startproc
movl (%rdi), %eax
ret
and
_Z3fncR3foo:
.LFB1027:
.cfi_startproc
movl (%rdi), %eax
ret
as you can see, no function call was made in the second case and they are both the same. Meaning there is no performance overhead in using the accessor method.
bonus: what happens if optimizations are turned off? same code, here are the results:
_Z6directR3foo:
.LFB1022:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movq %rdi, -8(%rbp)
movq -8(%rbp), %rax
movl (%rax), %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
and
_Z3fncR3foo:
.LFB1023:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movq %rdi, -8(%rbp)
movq -8(%rbp), %rax
movq %rax, %rdi
call _ZN3foo5get_xEv #<<<call to foo.get_x()
movl (%rax), %eax
leave
.cfi_def_cfa 7, 8
ret
As you can see without optimizations, the sturct is faster than the accessor, but who ships code without optimizations?
You can expect identical performance. A great many C++ classes rely on this - for example, C++11's list::size() const can be expected to trivially return a data member. (Which contrasts with vector(), where the implementation's I've looked at calculate size() as the difference between pointer data member's corresponding to begin() and end(), ensuring typical iterator usage is as fast as possible at the cost of potentially slower indexed iteration, if the optimiser can't determine that size() is constant across loop iterations).
There's typically no particular reason to return by const reference for a type like unsigned that should fit in a CPU register anyway, but as it's inlined the compiler doesn't have to take that literally (for an out-of-line version it would likely be implemented by returning a pointer that has to be dereferenced). (The atypical reason is to allow taking the address of the variable, which is why say vector::operator[](size_t) const needs to return a const T& rather than a T, even if T is small enough to fit in a register.)
There is only one way to tell with certainty which one is faster in your particular program built with your particular tools with your particular optimisation flags on your particular platform — by measuring both variants.
Having said that, chances are good that the binaries will be identical, instruction for instruction.
As others have said, optimizers these days are relied on to boil out abstraction (especially in C++, which is more or less built to take advantage of that) and they're very, very good.
But you might not need the getter for this.
struct Foo {
Foo(unsigned set) : vari(set) {}
unsigned const vari;
};
const doesn't forbid initialization.

Are templates + functors/lambdas suboptimal in terms of memory usage?

For illustrative purposes, let's say I want to implement a generic integer comparing function. I can think of a few approaches for the definition/invocation of the function.
(A) Function Template + functors
template <class Compare> void compare_int (int a, int b, const std::string& msg, Compare cmp_func)
{
if (cmp_func(a, b)) std::cout << "a is " << msg << " b" << std::endl;
else std::cout << "a is not " << msg << " b" << std::endl;
}
struct MyFunctor_LT {
bool operator() (int a, int b) {
return a<b;
}
};
And this would be a couple of calls to this function:
MyFunctor_LT mflt;
MyFunctor_GT mfgt; //not necessary to show the implementation
compare_int (3, 5, "less than", mflt);
compare_int (3, 5, "greater than", mflt);
(B) Function template + lambdas
We would call compare_int like this:
compare_int (3, 5, "less than", [](int a, int b) {return a<b;});
compare_int (3, 5, "greater than", [](int a, int b) {return a>b;});
(C) Function template + std::function
Same template implementation, invocation:
std::function<bool(int,int)> func_lt = [](int a, int b) {return a<b;}; //or a functor/function
std::function<bool(int,int)> func_gt = [](int a, int b) {return a>b;};
compare_int (3, 5, "less than", func_lt);
compare_int (3, 5, "greater than", func_gt);
(D) Raw "C-style" pointers
Implementation:
void compare_int (int a, int b, const std::string& msg, bool (*cmp_func) (int a, int b))
{
...
}
bool lt_func (int a, int b)
{
return a<b;
}
Invocation:
compare_int (10, 5, "less than", lt_func);
compare_int (10, 5, "greater than", gt_func);
With those scenarios laid out, we have in each case:
(A) Two template instances (two different parameters) will be compiled and allocated in memory.
(B) I would say also two template instances would be compiled. Each lambda is a different class. Correct me if I'm wrong, please.
(C) Only one template instance would be compiled, since template parameter is always te same: std::function<bool(int,int)>.
(D) Obviously we only have one instance.
Needless to day, it doesn't make a difference for such a naive example. But when working with dozens (or hundreds) of templates, and numerous functors, the compilation times and memory usage difference can be substantial.
Can we say that in many circumstances (i.e., when using too many functors with the same signature) std::function (or even function pointers) must be preferred over templates+raw functors/lambdas? Wrapping your functor or lambda with std::function may be very convenient.
I am aware that std::function (function pointer too) introduces an overhead. Is it worth it?
EDIT. I did a very simple benchmark using the following macros and a very common standard library function template (std::sort):
#define TEST(X) std::function<bool(int,int)> f##X = [] (int a, int b) {return (a^X)<(b+X);}; \
std::sort (v.begin(), v.end(), f##X);
#define TEST2(X) auto f##X = [] (int a, int b) {return (a^X)<(b^X);}; \
std::sort (v.begin(), v.end(), f##X);
#define TEST3(X) bool(*f##X)(int, int) = [] (int a, int b) {return (a^X)<(b^X);}; \
std::sort (v.begin(), v.end(), f##X);
Results are the following regarding size of the generated binaries (GCC at -O3):
Binary with 1 TEST macro instance: 17009
1 TEST2 macro instance: 9932
1 TEST3 macro instance: 9820
50 TEST macro instances: 59918
50 TEST2 macro instances: 94682
50 TEST3 macro instances: 16857
Even if I showed the numbers, it is a more qualitative than quantitative benchmark. As we were expecting, function templates based on the std::function parameter or function pointer scale better (in terms of size), as not so many instances are created. I did not measure runtime memory usage though.
As for the performance results (vector size is 1000000 of elements):
50 TEST macro instances: 5.75s
50 TEST2 macro instances: 1.54s
50 TEST3 macro instances: 3.20s
It is a notable difference, and we must not neglect the overhead introduced by std::function (at least if our algorithms consist of millions of iterations).
As others have already pointed out, lambdas and function objects are likely to be inlined, especially if the body of the function is not too long. As a consequence, they are likely to be better in terms of speed and memory usage than the std::function approach. The compiler can optimize your code more aggressively if the function can be inlined. Shockingly better. std::function would be my last resort for this reason, among other things.
But when working with dozens (or hundreds) of templates, and numerous
functors, the compilation times and memory usage difference can be
substantial.
As for the compilation times, I wouldn't worry too much about it as long as you are using simple templates like the one shown. (If you are doing template metaprogramming, yeah, then you can start worrying.)
Now, the memory usage: By the compiler during compilation or by the generated executable at run time? For the former, same holds as for the compilation time. For the latter: Inlined lamdas and function objects are the winners.
Can we say that in many circumstances std::function (or even
function pointers) must be preferred over templates+raw
functors/lambdas? I.e. wrapping your functor or lambda with
std::function may be very convenient.
I am not quite sure how to answer to that one. I cannot define "many circumstances".
However, one thing I can say for sure is that type erasure is a way to avoid / reduce code bloat due to templates, see Item 44: Factor parameter-independent code out of templates in Effective C++. By the way, std::function uses type erasure internally. So yes, code bloat is an issue.
I am aware that std::function (function pointer too) introduces an overhead. Is it worth it?
"Want speed? Measure." (Howard Hinnant)
One more thing: function calls through function pointers can be inlined (even across compilation units!). Here is a proof:
#include <cstdio>
bool lt_func(int a, int b)
{
return a<b;
}
void compare_int(int a, int b, const char* msg, bool (*cmp_func) (int a, int b)) {
if (cmp_func(a, b)) printf("a is %s b\n", msg);
else printf("a is not %s b\n", msg);
}
void f() {
compare_int (10, 5, "less than", lt_func);
}
This is a slightly modified version of your code. I removed all the iostream stuff because it makes the generated assembly cluttered. Here is the assembly of f():
.LC1:
.string "a is not %s b\n"
[...]
.LC2:
.string "less than"
[...]
f():
.LFB33:
.cfi_startproc
movl $.LC2, %edx
movl $.LC1, %esi
movl $1, %edi
xorl %eax, %eax
jmp __printf_chk
.cfi_endproc
Which means, that gcc 4.7.2 inlined lt_func at -O3. In fact, the generated assembly code is optimal.
I have also checked: I moved the implementation of lt_func into a separate source file and enabled link time optimization (-flto). GCC still happily inlined the call through the function pointer! It is nontrivial and you need a quality compiler to do that.
Just for the record, and that you can actually feel the overhead of the std::function approach:
This code:
#include <cstdio>
#include <functional>
template <class Compare> void compare_int(int a, int b, const char* msg, Compare cmp_func)
{
if (cmp_func(a, b)) printf("a is %s b\n", msg);
else printf("a is not %s b\n", msg);
}
void f() {
std::function<bool(int,int)> func_lt = [](int a, int b) {return a<b;};
compare_int (10, 5, "less than", func_lt);
}
yields this assembly at -O3 (approx. 140 lines):
f():
.LFB498:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
.cfi_lsda 0x3,.LLSDA498
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movl $1, %edi
subq $80, %rsp
.cfi_def_cfa_offset 96
movq %fs:40, %rax
movq %rax, 72(%rsp)
xorl %eax, %eax
movq std::_Function_handler<bool (int, int), f()::{lambda(int, int)#1}>::_M_invoke(std::_Any_data const&, int, int), 24(%rsp)
movq std::_Function_base::_Base_manager<f()::{lambda(int, int)#1}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<f()::{lambda(int, int)#1}> const&, std::_Manager_operation), 16(%rsp)
.LEHB0:
call operator new(unsigned long)
.LEHE0:
movq %rax, (%rsp)
movq 16(%rsp), %rax
movq $0, 48(%rsp)
testq %rax, %rax
je .L14
movq 24(%rsp), %rdx
movq %rax, 48(%rsp)
movq %rsp, %rsi
leaq 32(%rsp), %rdi
movq %rdx, 56(%rsp)
movl $2, %edx
.LEHB1:
call *%rax
.LEHE1:
cmpq $0, 48(%rsp)
je .L14
movl $5, %edx
movl $10, %esi
leaq 32(%rsp), %rdi
.LEHB2:
call *56(%rsp)
testb %al, %al
movl $.LC0, %edx
jne .L49
movl $.LC2, %esi
movl $1, %edi
xorl %eax, %eax
call __printf_chk
.LEHE2:
.L24:
movq 48(%rsp), %rax
testq %rax, %rax
je .L23
leaq 32(%rsp), %rsi
movl $3, %edx
movq %rsi, %rdi
.LEHB3:
call *%rax
.LEHE3:
.L23:
movq 16(%rsp), %rax
testq %rax, %rax
je .L12
movl $3, %edx
movq %rsp, %rsi
movq %rsp, %rdi
.LEHB4:
call *%rax
.LEHE4:
.L12:
movq 72(%rsp), %rax
xorq %fs:40, %rax
jne .L50
addq $80, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 16
popq %rbx
.cfi_def_cfa_offset 8
ret
.p2align 4,,10
.p2align 3
.L49:
.cfi_restore_state
movl $.LC1, %esi
movl $1, %edi
xorl %eax, %eax
.LEHB5:
call __printf_chk
jmp .L24
.L14:
call std::__throw_bad_function_call()
.LEHE5:
.L32:
movq 48(%rsp), %rcx
movq %rax, %rbx
testq %rcx, %rcx
je .L20
leaq 32(%rsp), %rsi
movl $3, %edx
movq %rsi, %rdi
call *%rcx
.L20:
movq 16(%rsp), %rax
testq %rax, %rax
je .L29
movl $3, %edx
movq %rsp, %rsi
movq %rsp, %rdi
call *%rax
.L29:
movq %rbx, %rdi
.LEHB6:
call _Unwind_Resume
.LEHE6:
.L50:
call __stack_chk_fail
.L34:
movq 48(%rsp), %rcx
movq %rax, %rbx
testq %rcx, %rcx
je .L20
leaq 32(%rsp), %rsi
movl $3, %edx
movq %rsi, %rdi
call *%rcx
jmp .L20
.L31:
movq %rax, %rbx
jmp .L20
.L33:
movq 16(%rsp), %rcx
movq %rax, %rbx
testq %rcx, %rcx
je .L29
movl $3, %edx
movq %rsp, %rsi
movq %rsp, %rdi
call *%rcx
jmp .L29
.cfi_endproc
Which approach would you like to choose when it comes to performance?
If you bind your lambda to a std::function then your code will run slower because it will no longer be inlineable, invoking goes through function pointer and creation of the function object possibly requires a heap allocation if the size of the lambda (= size of captured state) exceeds the small buffer limit (which is equal to the size of one or two pointers on GCC IIRC).
If you keep your lambda, e.g. auto a = []{};, then it will be as fast as an inlineable function (perhaps even faster because there's no conversion to function pointer when passed as an argument to a function.)
The object code generated by lambda and inlineable function objects will be zero when compiling with optimizations enabled (-O1 or higher in my tests). Occasionally the compiler may refuse to inline but that normally only happens when trying to inline big function bodies.
You can always look at the generated assembly if you want to be certain.
I will discuss what happens naively, and common optimizations.
(A) Function Template + functors
In this case, there will be one functor whose type and arguments fully describes what happens when you invoke (). There will be one instance of the template function per functor passed to it.
While the functor has a minimium size of 1 byte, and must technically be copied, the copy is a no-op (not even the 1 byte of space need be copied: a dumb compiler/low optimization settings may result in it being copied anyhow).
Optimizing the existance of that functor object away is an easy thing for compilers to do: the method is inline and has the same symbol name, so this will even happen over multiple compilation units. Inlining the call is also very easy.
If you had multiple function objects with the same implementation, havine one instance of the functor despite this takes a bit of effort, but some compilers can do it. Inlining the template functions may also be easy. In your toy example, as the inputs are known at compile time, branches can be evaluated at compile time, dead code eliminated, and everything reduced to a single std::cout call.
(B) Function template + lambdas
In this case, each lambda is its own type, and each closure instance is an undefined size instance of that lambda (usually 1 byte as it captures nothing). If an identical lambda is defined and used at different spots, these are different types. Each call location for the function object is a distinct instantiation of the template function.
Removing the existance of the 1 byte closures, assuming they are stateless, is easy. Inlining them is also easy. Removing duplicate instances of the template function with the same implementation but different signature is harder, but some compilers will do it. Inlining said functions is no harder than above. In your toy example, as the inputs are known at compile time, branches can be evaluated at compile time, dead code eliminated, and everything reduced to a single std::cout call.
(C) Function template + std::function
std::function is a type erasure object. An instance of std::function with a given signature has the same type as another. However, the constructor for std::function is templated on the type passed in. In your case, you are passing in a lambda -- so each location where you initialize a std::function with a lambda generates a distinct std::function constructor, which does unknown code.
A typical implementation of std::function will use the pImpl pattern to store a pointer to the abstract interface to a helper object that wraps your callable, and knows how to copy/move/call it with the std::function signature. One such callable type is created per type std::function is constructed from per std::function signature it is constructed to.
One instance of the function will be created, taking a std::function.
Some compilers can notice duplicate methods and use the same implementation for both, and maybe pull of a similar trick for (much) of their virtual function table (but not all, as dynamic casting requires they be different). This is less likely to happen than the earlier duplicate function elimination. The code in the duplicated methods used by the std::function helper is probably simpler than other duplicated functions, so that could be cheaper.
While the template function can be inlined, I am not aware of a C++ compiler that can optimize the existance of a std::function away, as they are usually implemented as library solutions consisting of relatively complex and opaque code to the compiler. So while theoretically it could be evaluated as all information is there, in practice the std::function will not be inlined into the template function, and no dead code elimination will occur. Both branches will make it into the resulting binary, together with a pile of std::function boilerplate for itself and its helper.
Calling a std::function is roughly as expensive as making a virtual method call -- or, as a rule of thumb, very roughly as expensive as two function pointer calls.
(D) Raw "C-style" pointers
A function is created, its address taken, this address is passed to compare_int. It then dereferences that pointer to find the actual function, and calls it.
Some compilers are good at noticing that the function pointer is created from a literal, then inlining the call here: not all compilers can do this, and no or few compilers can do this in the general case. If they cannot (because the initialization isn't from a literal, because the interface is in one compilation unit while the implementation is in another) there is a significant cost to following a function pointer to data -- the computer tends to be unable to cache the location it is going to, so there is a pipeline stall.
Note that you can call the raw "C-style" pointers with stateless lambdas, as stateless lambdas implicitly convert to function pointers. Also note that this example is strictly weaker than the other ones: it does not accept stateful functions. The version with the same power would be a C-style function that takes both the pair of ints and a void* state.
You should use auto keyword with lambda functions, not std::function. That way you get unique type and no runtime overhead of std::function.
Also, as dyp suggests, stateless (that is, no captures) lamba functions can be converted to function pointer.
In A, B and C you probably end up with a binary that does not contain a comparator, nor any templates. It will literally inline all of the comparison and probably even remove the non-true branch of the printing - in effect, it'll be "puts" calls for the actual result, without any checks left to do.
In D your compiler can't do that.
So rather, it's more optimal in this example. It's also more flexible - an std::function can hide members being stored, or it just being a plain C function, or it being a complex object. It even allows you to derive the implementation from aspects of the type - if you could do a more efficient comparison for POD types you can implement that and forget about it for the rest.
Think of it this way - A and B are higher abstract implementations that allow you to tell the compiler "This code is probably best implemented for each type separately, so do that for me when I use it". C is one that says "There'll be multiple comparison operators that are complex, but they'll all look like this so only make one implementation of the compare_int function". In D you tell it "Don't bother, just make me these functions. I know best." None is unequivocally better than the rest.

gcc stack optimization

Hi I have a question on possible stack optimization by gcc (or g++)..
Sample code under FreeBSD (does UNIX variance matter here?):
void main() {
char bing[100];
..
string buffer = ....;
..
}
What I found in gdb for a coredump of this program is that the address
of bing is actually lower than that buffer (namely, &bing[0] < &buffer).
I think this is totally the contrary of was told in textbook. Could there
be some compiler optimization that re-organize the stack layout in such a
way?
This seems to be only possible explanation but I'm not sure..
In case you're interested, the coredump is due to the buffer overflow by
bing to buffer (but that also confirms &bing[0] < &buffer).
Thanks!
Compilers are free to organise stack frames (assuming they even use stacks) any way they wish.
They may do it for alignment reasons, or for performance reasons, or for no reason at all. You would be unwise to assume any specific order.
If you hadn't invoked undefined behavior by overflowing the buffer, you probably never would have known, and that's the way it should be.
A compiler can not only re-organise your variables, it can optimise them out of existence if it can establish they're not used. With the code:
#include <stdio.h>
int main (void) {
char bing[71];
int x = 7;
bing[0] = 11;
return 0;
}
Compare the normal assembler output:
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $80, %esp
movl %gs:20, %eax
movl %eax, 76(%esp)
xorl %eax, %eax
movl $7, (%esp)
movb $11, 5(%esp)
movl $0, %eax
movl 76(%esp), %edx
xorl %gs:20, %edx
je .L3
call __stack_chk_fail
.L3:
leave
ret
with the insanely optimised:
main:
pushl %ebp
xorl %eax, %eax
movl %esp, %ebp
popl %ebp
ret
Notice anything missing from the latter? Yes, there are no stack manipulations to create space for either bing or x. They don't exist. In fact, the entire code sequence boils down to:
set return code to 0.
return.
A compiler is free to layout local variables on the stack (or keep them in register or do something else with them) however it sees fit: the C and C++ language standards don't say anything about these implementation details, and neither does POSIX or UNIX. I doubt that your textbook told you otherwise, and if it did, I would look for a new textbook.

Does the evil cast get trumped by the evil compiler?

This is not academic code or a hypothetical quesiton. The original problem was converting code from HP11 to HP1123 Itanium. Basically it boils down to a compile error on HP1123 Itanium. It has me really scratching my head when reproducing it on Windows for study. I have stripped all but the most basic aspects... You may have to press control D to exit a console window if you run it as is:
#include "stdafx.h"
#include <iostream>
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
char blah[6];
const int IAMCONST = 3;
int *pTOCONST;
pTOCONST = (int *) &IAMCONST;
(*pTOCONST) = 7;
printf("IAMCONST %d \n",IAMCONST);
printf("WHATISPOINTEDAT %d \n",(*pTOCONST));
printf("Address of IAMCONST %x pTOCONST %x\n",&IAMCONST, (pTOCONST));
cin >> blah;
return 0;
}
Here is the output
IAMCONST 3
WHATISPOINTEDAT 7
Address of IAMCONST 35f9f0 pTOCONST 35f9f0
All I can say is what the heck? Is it undefined to do this? It is the most counterintuitive thing I have seen for such a simple example.
Update:
Indeed after searching for a while the Menu Debug >> Windows >> Disassembly had exactly the optimization that was described below.
printf("IAMCONST %d \n",IAMCONST);
0024360E mov esi,esp
00243610 push 3
00243612 push offset string "IAMCONST %d \n" (2458D0h)
00243617 call dword ptr [__imp__printf (248338h)]
0024361D add esp,8
00243620 cmp esi,esp
00243622 call #ILT+325(__RTC_CheckEsp) (24114Ah)
Thank you all!
Looks like the compiler is optimizing
printf("IAMCONST %d \n",IAMCONST);
into
printf("IAMCONST %d \n",3);
since you said that IAMCONST is a const int.
But since you're taking the address of IAMCONST, it has to actually be located on the stack somewhere, and the constness can't be enforced, so the memory at that location (*pTOCONST) is mutable after all.
In short: you casted away the constness, don't do that. Poor, defenseless C...
Addendum
Using GCC for x86, with -O0 (no optimizations), the generated assembly
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $36, %esp
movl $3, -12(%ebp)
leal -12(%ebp), %eax
movl %eax, -8(%ebp)
movl -8(%ebp), %eax
movl $7, (%eax)
movl -12(%ebp), %eax
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
movl -8(%ebp), %eax
movl (%eax), %eax
movl %eax, 4(%esp)
movl $.LC1, (%esp)
call printf
copies from *(bp-12) on the stack to printf's arguments. However, using -O1 (as well as -Os, -O2, -O3, and other optimization levels),
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp
movl $3, 4(%esp)
movl $.LC0, (%esp)
call printf
movl $7, 4(%esp)
movl $.LC1, (%esp)
call printf
you can clearly see that the constant 3 is used instead.
If you are using Visual Studio's CL.EXE, /Od disables optimization. This varies from compiler to compiler.
Be warned that the C specification allows the C compiler to assume that the target of any int * pointer never overlaps the memory location of a const int, so you really shouldn't be doing this at all if you want predictable behavior.
The constant value IAMCONST is being inlined into the printf call.
What you're doing is at best wrong and in all likelihood is undefined by the C++ standard. My guess is that the C++ standard leaves the compiler free to inline a const primitive which is local to a function declaration. The reason being that the value should not be able to change.
Then again, it's C++ where should and can are very different words.
You are lucky the compiler is doing the optimization. An alternative treatment would be to place the const integer into read-only memory, whereupon trying to modify the value would cause a core dump.
Writing to a const object through a cast that removes the const is undefined behavior - so at the point where you do this:
(*pTOCONST) = 7;
all bets are off.
From the C++ standard 7.1.5.1 (The cv-qualifiers):
Except that any class member declared mutable (7.1.1) can be modified, any attempt to modify a const
object during its lifetime (3.8) results in undefined behavior.
Because of this, the compiler is free to assume that the value of IAMCONST will not change, so it can optimize away the access to the actual storage. In fact, if the address of the const object is never taken, the compiler may eliminate the storage for the object altogether.
Also note that (again in 7.1.5.1):
A variable of non-volatile const-qualified integral or enumeration type initialized by an integral constant expression can be used in integral constant expressions (5.19).
Which means IAMCONST can be used in compile-time constant expressions (ie., to provide a value for an enumeration or the size of an array). What would it even mean to change that at runtime?
It doesn't matter if the compiler optimizes or not. You asked for trouble and you're lucky you got the trouble yourself instead of waiting for customers to report it to you.
"All I can say is what the heck? Is it undefined to do this? It is the most counterintuitive thing I have seen for such a simple example."
If you really believe that then you need to switch to a language you can understand, or change professions. For the sake of yourself and your customers, stop using C or C++ or C#.
const int IAMCONST = 3;
You said it.
int *pTOCONST;
pTOCONST = (int *) &IAMCONST;
Guess why the compiler complained if you omitted your evil cast. The compiler might have been telling the truth before you lied to it.
"Does the evil cast get trumped by the evil compiler?"
No. The evil cast gets trumped by itself. Whether or not your compiler tried to tell you the truth, the compiler was not evil.