Why doesn't gcc optimize matching throw and catch? - c++

Consider this snippet:
static inline void g() {
throw 10;
}
int f() {
try {
g();
} catch (const int &x) {
return x;
}
return 0;
}
As you can see from the Compiler Explorer, GCC is able to inline the call to g, but still goes through the whole process of throwing and catching the expression, calling the runtime library. Is there any reason why GCC couldn't just compile f to mov eax, 10; ret? The same happens with clang, with little differences.
An optimization like this would seem when the compiler can inline the throwing site inside the catching site, which perhaps would happen frequently if you call some throwing STL method and catch just outside the call. I guess that avoiding going through the runtime could be a significant improvement in hot loops.

Related

Can the compiler decide the noexcept'ness of a function?

Let's do an example
class X
{
int value;
public:
X (int def = 0) : value (def) {}
void add (int i)
{
value += i;
}
};
Clearly, the function void X::add (int) will never throw any exception.
My question is, can the compiler analyze the code and decide not to generate machine code to handle exceptions, even if the function is not marked as noexcept?
If the compiler can prove that a function will never throw, it is allowed by the "As-If" rule (ยง1.9, "Program execution" of the C++ standard) to remove the code to handle exceptions.
However it is not possible to decide if a function will never throw in general, as it amounts to solving the Halting Problem.

why does the compiler not optimize this load away

In the following c++ program:
struct O
{
const int id;
};
extern void externalFunc();
int func(O* o)
{
//first load of o->id happens here:
int first = o->id;
externalFunc();
//second load happens here, but why?
return o->id + first;
}
both Clang and MSVC with all optimizations on compile this code in a way where the o->id value gets loaded from memory twice.
Why are these compilers unable to remove the second load? I am trying to tell the compiler the value is guaranteed not to change by having the id member marked const, but apparently both compilers do not find this sufficient guarantee. If I remove the invocation of externalFunc() they do optimize away the second load. How do I convince the compiler this value is really not going to change?
Consider:
#include <iostream>
struct O
{
const int id;
O(int x) : id(x) {}
};
O* global = nullptr;
void externalFunc() {
global->~O();
new(global) O(42);
}
int func(O* o)
{
int first = o->id;
externalFunc();
// o->id has changed, even though o hasn't
return o->id + first;
}
int main()
{
O o(1);
global = &o;
std::cout << func(&o);
}
Output: 43. Demo
externalFunc() might alter o->id. (Not o, which is a local variable.)
Why are these compilers unable to remove the second load? I am trying to tell the compiler the value is guaranteed not to change by having the id member marked const, but apparently both compilers do not find this sufficient guarantee.
Because it isn't. Consider this example.
static O mg {5};
void
externalFunc()
{
mg.~O();
new (&mg) O {6};
}
int
main()
{
std::cout << mg.id << '\n';
func(&mg);
std::cout << mg.id << '\n';
}
The first load reads the value 5, the second will read 6.
How do I convince the compiler this value is really not going to change?
Simply cache the field. This will still not convince the compiler that o->id will not change but it will assure it that if it does, you don't care.
int
func(O* o)
{
const int id = o->id;
externalFunc();
return id + id;
}
I have made it a general habit to cache all values of primitive fields that I access via a pointer (including the this pointer) into local (const) variables. If the compiler can ensure that the values can't change, it has no additional cost, and if it can't, it might produce slightly better code. As a nice aside, it also allows you to give names to the values that make most sense in the context of the function.
The compiler itself doesn't have the code for externalFunc() while compiling func(), so it doesn't know what it might do. Therefore, it functions as a barrier.
If you linked statically, this would fall under link-time-optimization (which can be enabled on GCC with -flto), and is also supported in MSVC.
In order to hint to GCC/Clang (looks like MSVC doesn't support this one) that the function doesn't change global memory, you can mark it with the pure attribute:
extern void __attribute__((pure)) void externalFunc();
And then it will stop posing that obstacle.

Jump as an alternative to RTTI

I am learning how c++ is compiled into assembly and I found how exceptions works under the hood very interesting. If its okay to have more then one execution paths for exceptions why not for normal functions.
For example, lets say you have a function that can return a pointer to class A or something derived from A. The way your supposed to do it is with RTTI.
But why not, instead, have the called function, after computing the return value, jump back to the caller function into the specific location that matchs up with the return type. Like how exceptions, the execution flow can go normal or, if it throws, it lands in one of your catch handlers.
Here is my code:
class A
{
public:
virtual int GetValue() { return 0; }
};
class B : public A
{
public:
int VarB;
int GetValue() override { return VarB; }
};
class C : public A
{
public:
int VarC;
int GetValue() override { return VarC; }
};
A* Foo(int i)
{
if(i == 1) return new B;
if(i == 2)return new C;
return new A;
}
void main()
{
A* a = Foo(2);
if(B* b = dynamic_cast<B*>(a))
{
b->VarB = 1;
}
else if(C* c = dynamic_cast<C*>(a)) // Line 36
{
c->VarC = 2;
}
else
{
assert(a->GetValue() == 0);
}
}
So instead of doing it with RTTI and dynamic_cast checks, why not have the Foo function just jump to the appropriate location in main. So in this case Foo returns a pointer to C, Foo should instead jump to line 36 directly.
Whats wrong with this? Why aren't people doing this? Is there a performance reason? I would think this would be cheaper then RTTI.
Or is this just a language limitation, regardless if its a good idea or not?
First of all, there are million different ways of defining the language. C++ is defined as it is defined. Nice or not really does not matter. If you want to improve the language, you are free to write a proposal to C++ committee. They will review it and maybe include in future standards. Sometimes this happens.
Second, although exceptions are dispatched under the hood, there are no strong reasons to think that this is more efficient comparing your handwritten code that uses RTTI. Exception dispatch still requires CPU cycles. There is no miracle there. The real difference is that for using RTTI you need to write the code yourself, while the exception dispatch code is generated for you by compiler.
You may want to call you function 10000 times and find out what will run faster: RTTI based code or exception dispatch.

C++ inline function & context specific optimization

I have read in Scott Meyers' Effective C++ book that:
When you inline a function you may enable the compiler to perform context specific optimizations on the body of function. Such optimization would be impossible for normal function calls.
Now the question is: what is context specific optimization and why it is necessary?
I don't think "context specific optimization" is a defined term, but I think it basically means the compiler can analyse the call site and the code around it and use this information to optimise the function.
Here's an example. It's contrived, of course, but it should demonstrate the idea:
Function:
int foo(int i)
{
if (i < 0) throw std::invalid_argument("");
return -i;
}
Call site:
int bar()
{
int i = 5;
return foo(i);
}
If foo is compiled separately, it must contain a comparison and exception-throwing code. If it's inlined in bar, the compiler sees this code:
int bar()
{
int i = 5;
if (i < 0) throw std::invalid_argument("");
return -i;
}
Any sane optimiser will evaluate this as
int bar()
{
return -5;
}
If the compile choose to inline a function, it will replace a function call to this function by the body of the function. It now has more code to optimize inside the caller function body. Therefore, it often leads to better code.
Imagine that:
bool callee(bool a){
if(a) return false;
else return true;
}
void caller(){
if(callee(true)){
//Do something
}
//Do something
}
Once inlined, the code will be like this (approximatively):
void caller(){
bool a = true;
bool ret;
if(a) ret = false;
else ret = true;
if(ret){
//Do something
}
//Do something
}
Which may be optimized further too:
void caller(){
if(false){
//Do something
}
//Do something
}
And then to:
void caller(){
//Do something
}
The function is now much smaller and you don't have the cost of the function call and especially (regarding the question) the cost of branching.
Say the function is
void fun( bool b) { if(b) do_sth1(); else do_sth2(); }
and it is called in the context with pre-defined false parameter
bool param = false;
...
fun( param);
then the compiler may reduce the function body to
...
do_sth2();
I don't think that context specific optimization means something specific and you probably can't find exact definition.
Nice example would be classical getter for some class attributes, without inlining it program has to:
jump to getter body
move value to registry (eax on x86 under windows with default Visual studio settings)
jump back to callee
move value from eax to local variable
While using inlining can skip almost all the work and move value directly to local variable.
Optimizations strictly depend on compiler but lot of think can happen (variable allocation may be skipped, code may get reorder and so on... But you always save call/jump which is expensive instruction.
More reading on optimisation here.

Can GCC optimize methods of a class with compile-time constant variables?

Preamble
I'm using avr-g++ for programming AVR microcontrollers and therefore I always need to get very efficient code.
GCC usually can optimize a function if its argument are compile-time constants, e.g. I have function pin_write(uint8_t pin, bool val) which determine AVR's registers for the pin (using my special map from integer pin to a pair port/pin) and write to these registers correspondent values. This function isn't too small, because of its generality. But if I call this function with compile-time constant pin and val, GCC can make all calculations at compile-time and eliminate this call to a couple of AVR instructions, e.g.
sbi PORTB,1
sbi DDRB,1
Amble
Let's write a code like this:
class A {
int x;
public:
A(int x_): x(x_) {}
void foo() { pin_write(x, 1); }
};
A a(8);
int main() {
a.foo();
}
We have only one object of class A and it's initialized with a constant (8). So, it's possible to make all calculations at compile-time:
foo() -> pin_write(x,1) -> pin_write(8,1) -> a couple of asm instructions
But GCC doesn't do so.
Surprisely, but if I remove global A a(8) and write just
A(8).foo()
I get exactly what I want:
00000022 <main>:
22: c0 9a sbi 0x18, 0 ; 24
24: b8 9a sbi 0x17, 0 ; 23
Question
So, is there a way to force GCC make all possible calculation at compile-time for single global objects with constant initializers?
Because of this trouble I have to manually expand such cases and replace my original code with this:
const int x = 8;
class A {
public:
A() {}
void foo() { pin_write(x, 1); }
}
UPD. It very wonderful: A(8).foo() inside main optimized to 2 asm instructions. A a(8); a.foo() too! But if I declare A a(8) as global -- compiler produce big general code. I tried to add static -- it didn't help. Why?
But if I declare A a(8) as global -- compiler produce big general code. I tried to add static -- it didn't help. Why?
In my experience, gcc is very reluctant if the object / function has external linkage. Since we don't have your code to compile, I made a slightly modified version of your code:
#include <cstdio>
class A {
int x;
public:
A(int x_): x(x_) {}
int f() { return x*x; }
};
A a(8);
int main() {
printf("%d", a.f());
}
I have found 2 ways to achive that the generated assembly corresponds to this:
int main() {
printf("%d", 64);
}
In words: to eliminate everything at compile time so that only the necessary minimum remains.
One way to achive this both with clang and gcc is:
#include <cstdio>
class A {
int x;
public:
constexpr A(int x_): x(x_) {}
constexpr int f() const { return x*x; }
};
constexpr A a(8);
int main() {
printf("%d", a.f());
}
gcc 4.7.2 already eliminates everything at -O1, clang 3.5 trunk needs -O2.
Another way to achieve this is:
#include <cstdio>
class A {
int x;
public:
A(int x_): x(x_) {}
int f() const { return x*x; }
};
static const A a(8);
int main() {
printf("%d", a.f());
}
It only works with clang at -O3. Apparently the constant folding in gcc is not that aggressive. (As clang shows, it can be done but gcc 4.7.2 did not implement it.)
You can force the compiler to fully optimize the function with all known constants by changing the pin_write function into a template. I don't know if the particular behavior is guaranteed by the standard though.
template< int a, int b >
void pin_write() { some_instructions; }
This will probably require fixing all lines where pin_write is used.
Additionally, you can declare the function as inline. The compiler isn't guaranteed to inline the function (the inline keyword is just an hint), but if it does, it has a greater chance to optimize compile time constants away (assuming the compiler can know it is an compile time constant, which may be not always the case).
Your a has external linkage, so the compiler can't be sure that there isn't other code somewhere modifying it.
If you were to declare a const then you make clear it shouldn't change, and also stop it having external linkage; both of those should help the compiler to be less pessimistic.
(I'd probably declare x const too - it may not help here, but if nothing else it makes it clear to the compiler and the next reader of the code that you never change it.)