Can ?: lead to less efficient code compared to if/else when returning an object?
Foo if_else()
{
if (bla)
return Foo();
else
return something_convertible_to_Foo;
}
If bla is false, the returned Foo is directly constructed from something_convertible_to_Foo.
Foo question_mark_colon()
{
return (bla) ? Foo() : something_convertible_to_Foo;
}
Here, the type of the expression after the return is Foo, so I guess first some temporary Foo is created if bla is false to yield the result of the expression, and then that temporary has to be copy-constructed to return the result of the function. Is that analysis sound?
A temporary Foo has to be constructed either way, and both cases are a clear candidate for RVO, so I don't see any reason to believe the compiler would fail to produce identical output in this case. As always, actually compiling the code and looking at the output is the best course of action.
It most definitely can where rvalue references are enabled. When one of the two branches is an lvalue and the other an rvalue, whichever way you go, you're going to not get the correct function called for at least one of them. When you do the if statement way, then the code will call the correct move or copy constructor for the return.
While I appreciate assembly output, I still find them a bit "too" low-level :)
For the following code:
struct Foo { Foo(): i(0) {} Foo(int i): i(i) {} int i; };
struct Bar { Bar(double d): d(d) {} double d; operator Foo() const { return Foo(d); } };
Foo If(bool cond) {
if (cond) { return Foo(); }
return Bar(3);
}
Foo Ternary(bool cond) {
return cond ? Foo() : Bar(3);
}
Here is the LLVM IR generated by Clang
define i64 #If(bool)(i1 zeroext %cond) nounwind readnone {
entry:
%retval.0.0 = select i1 %cond, i64 0, i64 3 ; <i64> [#uses=1]
ret i64 %retval.0.0
}
define i64 #Ternary(bool)(i1 zeroext %cond) nounwind readnone {
entry:
%tmp.016.0 = select i1 %cond, i64 0, i64 3 ; <i64> [#uses=1]
ret i64 %tmp.016.0
}
By the way, the llvm try out demo now uses Clang :p
Since it is not the first time that the question comes up, in one form or another, I would like to remember that since semantically both forms are equivalent, there is no reason for a good compiler to treat them any differently as far as optimization and code generation are concerned. The ternary operator is just syntactic sugar.
As always in case of performance question: measure for the case at hand, there are too many things to take into account to do any prediction.
Here, I'd not be surprised that some compilers have problems with one form or the other while others get rapidly to the same internal representation and thus generate exactly the same code.
I will be surprised if there is any difference since the two are logically equivalent. But this will depend on the compiler.
It depends on compiler. As far as i know, on most of compilers, if-else it's translated to cleaner ASM code and it's faster.
Edit: Assuming the code below
int a = 10;
int b = 20;
int c = 30;
int d = 30;
int y = 30;
y = (a > b) ? c : d;
if (a > b)
{
y = c;
}
else
{
y = d;
}
will be translated in ASM like this
y = (a > b) ? c : d;
008C13B1 mov eax,dword ptr [a]
008C13B4 cmp eax,dword ptr [b]
008C13B7 jle wmain+54h (8C13C4h)
008C13B9 mov ecx,dword ptr [c]
008C13BC mov dword ptr [ebp-100h],ecx
008C13C2 jmp wmain+5Dh (8C13CDh)
008C13C4 mov edx,dword ptr [d]
008C13C7 mov dword ptr [ebp-100h],edx
008C13CD mov eax,dword ptr [ebp-100h]
008C13D3 mov dword ptr [y],eax
if (a > b)
008C13D6 mov eax,dword ptr [a]
008C13D9 cmp eax,dword ptr [b]
008C13DC jle wmain+76h (8C13E6h)
{
y = c;
008C13DE mov eax,dword ptr [c]
008C13E1 mov dword ptr [y],eax
}
else
008C13E4 jmp wmain+7Ch (8C13ECh)
{
y = d;
008C13E6 mov eax,dword ptr [d]
008C13E9 mov dword ptr [y],eax
}
Related
My collegues and I were fighting a rather weird bug in an app we're developing. Eventually we got it fixed, but we are still unsure if what the compiler was doing is legit or not.
Assuming we have code like this:
class B {
public:
virtual int foo(int d) { return d - 10; }
};
class C : public B {
public:
virtual int foo(int d) { return d - 11; }
};
class A {
public:
A() : count(0) { member = new B;}
int bar() {
return member->foo(renew());
}
int renew() {
count++;
delete member;
member = new C;
return count;
}
private:
B *member;
int count;
};
int square() {
A a;
cout << a.bar() << endl;
return 0;
}
The Visual Studio x86 compiler, for function A::bar, generates something like this when compiled with /O1 (You can check the full code on godbolt):
push esi
push edi
mov edi, ecx
mov eax, DWORD PTR [edi] ; eax = member
mov esi, DWORD PTR [eax] ; esi = B::vtbl
call int A::renew(void) ; Changes the member, vtable and esi are no longer valid
mov ecx, DWORD PTR [edi]
push eax
call DWORD PTR [esi] ; Calls wrong stuff (B::vtbl[0])
pop edi
pop esi
ret 0
Is this optimization allowed by the standard or is it an undefined behaviour?
I was unable to get similar assembly with GCC or clang.
Just for perfect clarity, here's the Order of evaluation document Jarod42 already linked, and the relevant quote:
14) In a function-call expression, the expression that names the function is sequenced before every argument expression and every default argument.
So we should read the statement
return member->foo(renew());
as
return function-call-expression;
where function-call-expression is
{function-naming-expression member->foo} ( {argument-expression renew()} )
so, the function-naming-expression member->foo is sequenced-before the argument expression. The doc already linked says
If A is sequenced before B, then evaluation of A will be complete before evaluation of B begins.
so we have to completely evaluate member->foo first. I think it should expand like
// 1. evaluate function-naming-expression
auto tmp_this_member = this->member;
int (B::*tmp_foo)(int) = tmp_this_member->foo;
// 2. evaluate argument expression
int tmp_argument = this->renew();
// 3. make the function call
(tmp_this_member->*tmp_foo) ( tmp_argument );
... which is exactly what you see. This is the sequencing required by C++17, and prior to that the sequencing and behaviour were both undefined.
tl;dr the compiler is right, and that code would be nasty even if it worked.
Whereas order of evaluation is implementation specific prior to C++17, C++17 imposes some ordering, see evaluation order.
so in
this->member->foo(renew());
renew() might be called before evaluating this->member (prior C++17).
To guaranty order prior, C++17, you have to split into several different statement:
auto m = this->member;
auto param = renew(); // m is now pointing on deleted memory
m->foo(param); // UB.
or, for the other order:
auto param = renew();
this->member->foo(param);
Let's say I have the following types:
struct Common { int a, b, c; };
struct Full { int a, b, c; uint64_t x, y, z; };
Common and Full are standard-layout types, where Common is a prefix of Full. So if I put both in a union:
union U {
Common c;
Full f;
};
I would be allowed to read through c even if f was the active member per
[class.mem]/23.
Now the question is - is there a way for me, given a Full const*, to get a Common const* in a non-UB way?
void foo(Full const* f) {
Common c1;
memcpy(&c1, f, sizeof(c1)); // this obviously works, but I don't want
// to be copying all this stuff
auto c2 = reinterpret_cast<Common const*>(f); // is this ok?
// c2 and f are pointer-interconvertible iff f comes from a U
// but why does that U actually need to exist?
auto u = reinterpret_cast<U const*>(f); // ok per basic.lval/8.6??
auto c3 = &u->c; // ok per class.mem/23??
}
Short answer: reinterpret casting an object address to a different class and accessing through the pointer is undefined behaviour.
I think the reason is to enable more easy alias analysis by the compiler.
If the compiler can assume that an Object of Type X cannot alias an object of unrelated Type Y, even if they happen to be layout compatible, then it may be able to perform optimizations that it could not do otherwise.
struct X
{
int a;
int b;
};
struct Y
{
int c;
int d;
};
void updateXY(X *x, Y* y)
{
x->a = y->c;
x->b = y->c; // if *x and *y could alias, y->c would have to be reloaded
}
void updateXX(X *x, X* xx)
{
x->a = xx->a;
x->b = xx->a; // the compiler must reload xx->a because x and xx may alias
}
In fact gcc and clang do optimize updateXY more than updateXX, because it considers the aliasing.
updateXY(X*, Y*):
mov eax, DWORD PTR [rsi] // cache y->c
mov DWORD PTR [rdi], eax
mov DWORD PTR [rdi+4], eax
ret
updateXX(X*, X*):
mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdi], eax
mov eax, DWORD PTR [rsi] // reload y->c
mov DWORD PTR [rdi+4], eax
ret
It does not seem to make a difference for alias analysis whether a union between X and Y is visible to the compiler, i.e. the optimization is performed regardless. So in practice code that passed around pointers to non-active union members may still break. For your usecase of passing const pointers this is should be irrelevant.
class SomeClass
{
int classMember;
public:
int GetMember();
bool IsPositive();
};
int SomeClass::GetMember()
{
return classMember;
}
bool SomeClass::IsPositive()
{
int val = GetMember(); //Case#1. Accessing value using get method
int val = classMember; //Case#2. Accessing value directly
return (val > 0);
}
Question: Does using Case#1 have any advantages over Case#2 or vice versa? Are there any overheads (even the tiniest) in using get method as compared to the member directly?
The answer is, it really depends on how your compiler chooses to do something. The best way to see if there is a difference is to look at the disassembly.
int val = classMember;
010C4869 mov eax,dword ptr [this]
010C486C mov ecx,dword ptr [eax]
010C486E mov dword ptr [val],ecx
return val > 0;
010C4871 cmp dword ptr [val],0
010C4875 jle SomeClass::IsPositiveClassMember+20h (010C4880h)
010C4877 mov dword ptr [ebp-4],1
010C487E jmp SomeClass::IsPositiveClassMember+27h (010C4887h)
010C4880 mov dword ptr [ebp-4],0
010C4887 mov al,byte ptr [ebp-4]
vs.
int val = GetMember();
010C4829 mov ecx,dword ptr [this]
010C482C call SomeClass::GetMember (010C1168h)
010C4831 mov dword ptr [val],eax
return val > 0;
010C4834 cmp dword ptr [val],0
010C4838 jle SomeClass::IsPositiveGetMember+23h (010C4843h)
010C483A mov dword ptr [ebp-4],1
010C4841 jmp SomeClass::IsPositiveGetMember+2Ah (010C484Ah)
010C4843 mov dword ptr [ebp-4],0
010C484A mov al,byte ptr [ebp-4]
The second example calls SomeClass::GetMember, which has its own disassembly. So in the second case, instead of just loading the value from member, it makes a function call.
return classMember;
010C4817 mov eax,dword ptr [this]
010C481A mov eax,dword ptr [eax]
You'll note that the instructions to load val with the value of classMember are identical, so the overhead comes from the Call SomeClass::GetMember.
This is in debug mode however, with no optimization. If we optimize, and build in release, we see the following disassembly:
int val = classMember;
return val > 0;
013D4830 xor eax,eax
013D4832 cmp dword ptr [ecx],eax
013D4834 setg al
vs
int val = GetMember();
return val > 0;
013D4820 xor eax,eax
013D4822 cmp dword ptr [ecx],eax
013D4824 setg al
The compiler optimizes away the call, and there is no difference.
Its purely depend on the need of programmer.
When to create function?
To increase the modularity of the program
When some common tasks are performed repeatedly.
In your case say suppose two number to be added.
For that you may use
int a= data member 1 + data member 2
Say suppose you have to use that at many places.
Then you have consider the point number 2.
You can just create the function like addnumber() this for ease of use and readability.
Regarding performance,
Both are same because the member function are inline by default when defined inside class definition. So separate stack allocation for calling that function is not required
Using GetMember() might be a bit slower, unless its made inline explicitly or implicitly by your compiler.
However, using an accessor can greatly help your debugging, temporarily changing this:
int SomeClass::GetMember()
{
return classMember;
}
into this:
int SomeClass::GetMember()
{
std::cout << "GetMember() called when classMember=" << classMember << std::endl;
return classMember;
}
But this might be a bit oldschool.
I prefer to add const modifier to all built-in arguments in functions I write. E.g.:
void foo(const int arg1, const double arg2);
is better for me than:
void foo(int arg1, double arg2);
After code review I was told that const modifier brings an overhead when it is applied for integer and built-in types. Is that true and why?
Thanks,
It has no more overhead than a typedef does. Your coworker is wrong.
If you want to convince him, print out the disassembly of both variants, and show your coworker that they're the same.
However, adding the const qualifier to primitive types like this is utterly pointless and futile. They're copied anyway, and there's no harm in modifying them. There's nothing to be gained by making them const.
There's no overhead with const, I guess your coworkers are just confused with the use as it is (unfortunately) not so common. Personally I prefer to const as many local variables as possible since it increases readability.
Of course it's always easy to disprove, take the following program and compile with assembly output:
#include <stdio.h>
void foo1(int a, double b)
{
printf("Not const %d, %g\n", a, b);
}
void foo2(const int a, const double b)
{
printf("Const %d, %g\n", a, b);
}
int main()
{
for(int i = 0; i < 10; ++i)
{
foo1(i, 5.5 * i);
foo2(i, 12.8 * i);
}
return 0;
}
The assembly code generated for those functions is exactly the same (using VS2010 release-build):
For foo1 (without const-specifiers):
; 4 : {
push ebp
mov ebp, esp
; 5 : printf("Not const %d, %g\n", a, b);
fld QWORD PTR _b$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
push eax
push OFFSET ??_C#_0BC#FACFPKBC#Not?5const?5?$CFd?0?5?$CFg?6?$AA#
call DWORD PTR __imp__printf
add esp, 16 ; 00000010H
; 6 : }
For foo2 (with const-specifiers):
; 9 : {
push ebp
mov ebp, esp
; 10 : printf("Const %d, %g\n", a, b);
fld QWORD PTR _b$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
push eax
push OFFSET ??_C#_0O#LOLEPDHC#Const?5?$CFd?0?5?$CFg?6?$AA#
call DWORD PTR __imp__printf
add esp, 16 ; 00000010H
; 11 : }
That is not true.
Independent of that, you should not put const into the function declaration, since it is an implementation detail: it only qualifies the local variable in the function scope. So you can write it like this:
double foo(unsigned int a, double b); // declaration
double foo(unsigned int const a, double b) // implementation
{
b *= a;
return bar(b); // silly example
}
After code review I was told that const modifier brings an overhead when it is applied for integer and built-in types. Is that true and why?
Who you gave your code to review it? A junior programmer?
The above is not true. Quite the contrary. Using const might apply some optimization.
It's not true.1
Or, more precisely, I can't think of a reason why it could be true. All const does is force the compiler to check that you're not changing the value of the variable; but that's a compile-time check.
1. Assuming we're using the traditional definition of "overhead" that relates to runtime performance or compiled code size.
No, that's not true.
I think you/they confused const with passing by reference.
What type of overhead do you mean?
Overhead for compiled binary or overhead of compiler? I'm sure that compiled binary is identicall for the first and the second code samples you've added. For compiler - maybe. Const adds additional conditional, that must be checked in compile-time.
It can bring overhead, if it forces you to create an additional local variable.
Without const
void foo(const int arg1, double arg2)
{
if (arg1 == 1)
arg2 += 5.0;
// use arg2
}
With const
void foo(const int arg1, const double arg2)
{
double arg2Copy;
if (arg1 == 1)
arg2Copy = arg2 + 5.0;
else
arg2Copy = arg2;
// use arg2Copy
}
But this really depends on the compiler. If you're concerned about the overhead, you should compare the generated code.
I want dLower and dHigher to have the lower and higher of two double values, respectively - i.e. to sort them if they are the wrong way around. The most immediate answer, it seems is:
void ascending(double& dFirst, double& dSecond)
{
if(dFirst > dSecond)
swap(dFirst,dSecond);
}
ascending(dFoo, dBar);
But it seems like such an obvious thing to do I wondered if I'm just not using the right terminology to find a standard routine.
Also, how would you make that generic?
This is a good way of approaching it. It is as efficient as you are going to get. I doubt that this specific function has a generally recognized name. This is apparently called comparison-swap.
Generalizing it on type is as easy as:
template <typename T>
void ascending(T& dFirst, T& dSecond)
{
if (dFirst > dSecond)
std::swap(dFirst, dSecond);
}
Corroborating this function:
int main() {
int a=10, b=5;
ascending(a, b);
std::cout << a << ", " << b << std::endl;
double c=7.2, d=3.1;
ascending(c, d);
std::cout << c << ", " << d << std::endl;
return 0;
}
This prints:
5, 10
3.1, 7.2
Playing the "extremely generic" game:
template <typename T, typename StrictWeakOrdering>
void comparison_swap(T &lhs, T &rhs, StrictWeakOrdering cmp) {
using std::swap;
if (cmp(rhs, lhs)) {
swap(lhs, rhs);
}
}
template <typename T>
void comparison_swap(T &lhs, T &rhs) {
comparison_swap(lhs, rhs, std::less<T>());
}
This ticks the following boxes:
uses a less-than comparator, which is more likely to be readily available for a user-defined type, since it's used in standard algorithms.
Comparator is optionally configurable, and defaults to something sensible (you could use std::greater<T> as the default if you prefer and modify accordingly). It's also guaranteed valid for arbitrary pointers of the same type, which operator< isn't.
Uses either a specialization of std::swap, or a swap function found by ADL, just in case the type T provides one but not the other.
There may be some boxes I've forgotten about, though.
Let me throw a special case in here that only applies if performance is a absolutely critical issue there and if float accuracy is enough: You could consider the vector pipeline (if your target CPU has one).
Some CPUs can get you the min and max of each component of a vector with one instruction each, so you can process 4 values in one go - without any branches at all.
Again, this is a very special case and most likely not relevant for what you're doing, but I wanted to bring this up since "more efficient" was part of the question.
Why not use std::sort() with a lambda or a functor?
As you also asked,
Is there a more efficient way to sort
two numbers?
Considering efficiency, you may need to write your own swap function and test its performance against std::swap.
Here is Microsoft implementation.
template<class _Ty> inline
void swap(_Ty& _Left, _Ty& _Right)
{ // exchange values stored at _Left and _Right
if (&_Left != &_Right)
{ // different, worth swapping
_Ty _Tmp = _Left;
_Left = _Right;
_Right = _Tmp;
}
}
If you feel the condition if (&_Left != &_Right) check is not required, you can ommit it to improve the performance of the code. You can write your own swap like below.
template <class T>
inline void swap(T &left, T& right)
{
T temp = left;
left = right;
right = temp;
}
For me it looks like improved performance slightly for 10 crore calls.
Anyways, you need to measure performance related changes properly. Don't assume.
Some library functions may not run so fast as they are written considering generic usage, error checking etc. If performance is not critical in your application, it is recommended to use Library functions as they are well tested.
If performance is critical like hard realtime systems, there is nothing wrong in writing your own and use.
All the answers apart from EboMike's focus on programming generality and use the same underlying compare-and-swap approach. I'm interested in this question because it would be nice to have specialisations that avoid branching for pipelining efficiency. I'm sketching out some untested/unbenchmarked implementations that might compile more efficiently than the previous answers by exploiting conditional-move instructions (e.g., cmovl) to avoid branching. I have no idea if this manifests in real-world performance gains, however...
Programming generality could be added by making these specialisations, using the compare-and-swap approach in the generic case. This is a common enough problem that I would really love to see it correctly implemented as a set of architecture-tuned specialisations in a library.
I've included x86 assembly output from godbolt in comments.
/*
mov eax, dword ptr [rdi]
mov ecx, dword ptr [rsi]
cmp eax, ecx
mov edx, ecx
cmovl edx, eax
cmovl eax, ecx
mov dword ptr [rdi], edx
mov dword ptr [rsi], eax
ret
*/
void ascending1(int &a, int &b)
{
bool const pred = a < b;
int const _a = pred ? a : b;
int const _b = pred ? b : a;
a = _a;
b = _b;
}
/*
mov eax, dword ptr [rdi]
mov ecx, dword ptr [rsi]
mov edx, ecx
xor edx, eax
cmp eax, ecx
cmovle ecx, eax
mov dword ptr [rdi], ecx
xor ecx, edx
mov dword ptr [rsi], ecx
ret
*/
void ascending2(int &a, int &b)
{
bool const pred = a < b;
int const c = a^b;
a = pred ? a : b;
b = a^c;
}
/*
The following implementation changes to a function-style interface,
which I feel is more elegant, although admittedly always forces assignment
to occur, so will be more expensive if assignment is costly.
See foobar() to see that this rather nicely inlines.
mov eax, esi
mov ecx, esi
xor ecx, edi
cmp edi, esi
cmovle eax, edi
xor ecx, eax
shl rcx, 32
or rax, rcx
ret
*/
std::pair<int,int> ascending3(int const a, int const b)
{
bool const pred = a < b;
int const c = a^b;
int const x = pred ? a : b;
int const y = c^x;
return std::make_pair(x,y);
}
/*
This is to show that ascending3() inlines very nicely
to only 5 assembly instructions.
# inlined ascending3().
mov eax, esi
xor eax, edi
cmp edi, esi
cmovle esi, edi
xor eax, esi
# end of inlining.
add eax, esi
ret
*/
int foobar(int const a, int const b)
{
auto const [x,y] = ascending3(a,b);
return x+y;
}