How to clearly produce inlining results in C++ - c++

I've been reading again Scott Meyers' Effective C++ and more specifically Item 30 about inlining.
So I wrote the following, trying to induce that optimization with gcc 4.6.3
// test.h
class test {
public:
inline int max(int i) { return i > 5 ? 1 : -1; }
int foo(int);
private:
int d;
};
// test.cpp
int test::foo(int i) { return max(i); }
// main.cpp
#include "test.h"
int main(int argc, const char *argv[]) {
test t;
return t.foo(argc);
}
and produced the relevant assembly using alternatively the following:
g++ -S -I. test.cpp main.cpp
g++ -finline-functions -S -I. test.cpp main.cpp
Both commands produced the same assembly as far as the inline method is concerned;
I can see both the max() method body (also having a cmpl statement and the relevant jumps) and its call from foo().
Am I missing something terribly obvious? I can't say that I combed through the gcc man page, but couldn't find anything relevant standing out.
So, I just increased the optimization level to -O3 which has the inline optimizations on by default, according to:
g++ -c -Q -O3 --help=optimizers | grep inline
-finline-functions [enabled]
-finline-functions-called-once [enabled]
-finline-small-functions [enabled]
unfortunately, this optimized (as expected) the above code fragment almost out of existence.
max() is no longer there (at least as an explicitly tagged assembly block) and foo() has been reduced to:
_ZN4test3fooEi:
.LFB7:
.cfi_startproc
rep
ret
.cfi_endproc
which I cannot clearly understand at the moment (and is out of research scope).
Ideally, what I would like to see, would have been the assembly code for max() inside the foo() block.
Is there a way (either through cmd-line options or using a different (non-trivial?) code fragment) to produce such an output?

The compiler is entirely free to inline functiones even if you don't ask it to - both when you use inline keyword or not, or whether you use -finline-functions or not (although probably not if you use -fnoinline-functions - that would be contrary to what you asked for, and although the C++ standard doesn't say so, the flag becomes pretty pointless if it doesn't do something like what it says).
Next, the compiler is also not always certain that your function won't be used "somewhere else", so it will produce an out-of-line copy of most inline functions, unless it's entirely clear that it "can not possibly be called from somewhere else [for example the class is declared such that it can't be reached elsewhere].
And if you don't use the result of a function, and the function doesn't have side-effects (e.g. writing to a global variable, performing I/O or calling a function the compiler "doesn't know what it does"), then the compiler will eliminate that code as "dead" - because you don't really want unnecessary code, do you? Adding a return in front of max(i) in your foo function should help.

Related

How to prevent std::string being constructed from a nullptr at build time by checking the optimized compiler output

GCC doesn't output a warning when it's optimizer detects std::string being compiled with a nullptr. I found a workaround, just wondering if anything better. It takes advantage of the std::string implementation having an assert that fires when constructed with nullptr. So by compiling with optimisation and looking at the assembly, I can see that line.
Note, below is just an example, this isn't application code.
However, I wondered if there is a better way than what I do. I search for __throw_logic_error with that particular string.
I've pasted this from godbolt.org.
Can be compiled with GCC 12 as follows:
g++ -O1 -S -fverbose-asm -std=c++23 -Wall -o string.s string.cpp
#include <string>
void f()
{
const char * a = nullptr;
std::string s(a);
}
int main()
{
f();
}
.LC0:
.string "basic_string: construction from null is not valid"
f():
sub rsp, 8
mov edi, OFFSET FLAT:.LC0
call std::__throw_logic_error(char const*)
main:
sub rsp, 8
call f()
Throwing an exception could be in principle a legal use.
(This is the problem with logic_error's, they shouldn't be exceptions.)
For this reason, I doubt that a compiler will ever complain about this, even if it can evaluate all at compile time.
Your best bet is a static analyzer.
The version of clang-tidy in godbolt, can detect this,
warning: The parameter must not be null [clang-analyzer-cplusplus.StringChecker]
https://godbolt.org/z/h9GnT9ojx
This was incorporated in clang-tidy 17, https://clang.llvm.org/docs/analyzer/checkers.html#cplusplus-stringchecker-c
In the lines of compiler warnings you could be lucky (eventually) with something like this, although I wasn't lucky.
void f() {
const char * a = nullptr;
try {
std::string s(a);
}catch(std::logic_error&) {
std::unreacheable(); // or equivalent for current GCC
}
}

-O2 and -fPIC option in gcc

For performance optimization, I would like to make use of the reference of a string rather than its value. Depending on the compilation options, I obtain different results. The behavior is a bit unclear to me, and I do not know the actual gcc flag that causes that difference.
My code is
#include <string>
#include <iostream>
const std::string* test2(const std::string& in) {
// Here I want to make use of the pointer &in
// ...
// it's returned only for demonstration purposes...
return &in;
}
int main() {
const std::string* t1 = test2("text");
const std::string* t2 = test2("text");
// only for demonstration, the cout is printed....
std::cout<<"References are: "<<(t1==t2?"equivalent. ":"different. ")<<t1<<"\t"<<t2<<std::endl;
return 0;
}
There are three compilation options:
gcc main.cc -o main -lstdc++ -O0 -fPIC && ./main
gcc main.cc -o main -lstdc++ -O2 -fno-PIC && ./main
gcc main.cc -o main -lstdc++ -O2 -fPIC && ./main
The first two yield equivalent results (References are: different.), so the pointers are different, but the third one results in equivalent pointers (References are: equivalent.).
Why does this happen, and which option do I have to add to the options -O2 -fPIC such that the pointers become again different?
Since this code is embedded into a larger framework, I cannot drop the options -O2 or -fPIC.
Since I get the desired result with the option -O2 and also with -fPIC, but a different behavior if both flags are used together, the exact behavior of these flags is unclear to me.
I tried with gcc4.8 and gcc8.3.
Both t1 and t2 are dangling pointers, they point to a temporary std::string which is already destroyed. The temporary std::string is constructed from the string literal during each call to test2("text") and lives until the end of the full-expression (the ;).
Their exact values depend on how the compiler (re-)uses stack space at a particular optimization level.
which option do I have to add to the options -O2 -fPIC such that the pointers become again different?
The code exhibits undefined behavior because it's illegal to compare invalid pointer values. Simply don't do this.
If we ignore the comparing part, then we end up with this version:
#include <string>
#include <iostream>
void test2(const std::string& in) {
std::cout << "Address of in: " << (void*)&in << std::endl;
}
int main() {
test2("text");
test2("text");
}
Now this code is free from UB, and it will print either the same address or different addresses, depending on how the compiler re-uses stack space between function calls. There is no way to control this, but it's no problem because keeping track of addresses of temporaries is a bad idea to begin with.
You can try using const char* as the input argument instead, then no temporary will be created in a call test2("text"). But here again, whether or not two instances of "text" point to the same location is implementation-defined. Though GCC does coalesce identical string literals, so at least in GCC you should observe the behavior you're after.

What g++ option prevents inlining lambda-functions if compiled with -Os instead of -O2

Here the simple code for compile-time repetition of a lambda. I compiled for AVR with -Os and -O2. With -Os the lambda isn't inlined but with -O2. The g++ manual says that -Os is same as -O2 but disables some optimizations which increase the code size. I wonder if I can tweak the g++ options to inline such simple lambdas.
#include <cstdint>
#include <utility>
volatile uint8_t x;
namespace detail {
template<auto... II, typename F>
void repeat_impl(std::index_sequence<II...>, F&& f) {
( ((void)II, f()) , ...);
}
}
template<auto N, typename F>
void repeat(F&& f) {
detail::repeat_impl(std::make_index_sequence<N>{}, static_cast<F&&>(f));
}
int main() {
repeat<10>([](){
x/=2;
});
}
These three seem to be related with inline, turned on at -O2 but not -Os:
-finline-small-functions
-findirect-inlining
-fpartial-inlining
You could add one by one to -Os. Although inline may happen in an early phase of compile & optimization, there might be possibility that a function becomes feasible for inline after another optimization preceding inline is applied.
I do not think always_inline attribute is in C++ standard. (not sure, though) If it is not, it depends on the compiler. Gcc seems to support the attribute. Adding the attribute to the function would result in the function inlined.

What's the usecase of gcc's used attribute?

#include <stdio.h>
// xyz will be emitted with -flto (or if it is static) even when
// the function is unused
__attribute__((__used__))
void xyz() {
printf("Hello World!\n");
}
int main() {
return 0;
}
What do I need this for?
Is there any way I could still reach xyz somehow besides directly calling the function, like some dlsym() like magic?
Attribute used is helpful in situation when you want to force compiler to emit symbol, when normally it may be omitted. As GCC's documentation says (emphasis mine):
This attribute, attached to a function, means that code must be
emitted for the function even if it appears that the function is not
referenced. This is useful, for example, when the function is
referenced only in inline assembly.
For instance, if you have code as follows:
#include <iostream>
static int foo(int a, int b)
{
return a + b;
}
int main()
{
int result = 0;
// some inline assembly that calls foo and updates result
std::cout << result << std::endl;
}
you might notice, that no symbol foo is present with -O flag (optimization level -O1):
g++ -O -pedantic -Wall check.cpp -c
check.cpp:3: warning: ‘int foo(int, int)’ defined but not used
nm check.o | c++filt | grep foo
As a result you cannot reference foo within this (imaginary) inline assembly.
By adding:
__attribute__((__used__))
it turns into:
g++ -O -pedantic -Wall check.cpp -c
nm check.o | c++filt | grep foo
00000000 t foo(int, int)
thus now foo can be referenced within it.
You may also have spotted that gcc's warning is now gone, as you have tell you compiler that you are sure that foo is actually used "behind the scene".
A particular usecase is for interrupt service routines in a static library.
For example, a timer overflow interrupt:
void __attribute__((interrupt(TIMERA_VECTOR),used)) timera_isr(void)
This timera_isr is never called by any function in the user code, but it might form an essential part of a library.
To ensure it is linked and there isn't a interrupt vector pointing to an empty section the keyword ensures the linker doesn't optimise it out.
If you declare a global variable or function that is unused, gcc will optimized it out (with warning), but if you declared the global variable or the function with '__attribute__((used))', gcc will include it in object file (and linked executable).
https://gcc.gnu.org/legacy-ml/gcc-help/2013-09/msg00108.html
Another use case is to generate proper coverage information for header files. Functions declared in header files are usually removed by the compiler when unreferenced. Therefore, you will get 100% coverage in your coverage reports even if you forgot to call some functions that are located in the header file. To prevent this, you may mark your function with __attribute__((used)) in your coverage builds.

How to trace out why gcc and g++ produces different code

Is it possible to see what is going on behind gcc and g++ compilation process?
I have the following program:
#include <stdio.h>
#include <unistd.h>
size_t sym1 = 100;
size_t *addr = &sym1;
size_t *arr = (size_t*)((size_t)&arr + (size_t)&addr);
int main (int argc, char **argv)
{
(void) argc;
(void) argv;
printf("libtest: addr of main(): %p\n", &main);
printf("libtest: addr of arr: %p\n", &arr);
while(1);
return 0;
}
Why is it possible to produce the binary without error with g++ while there is an error using gcc?
I'm looking for a method to trace what makes them behave differently.
# gcc test.c -o test_app
test.c:7:1: error: initializer element is not constant
# g++ test.c -o test_app
I think the reason can be in fact that gcc uses cc1 as a compiler and g++ uses cc1plus.
Is there a way to make more precise output of what actually has been done?
I've tried to use -v flag but the output is quite similar. Are there different flags passed to linker?
What is the easiest way to compare two compilation procedures and find the difference in them?
In this case, gcc produces nothing because your program is not valid C. As the compiler explains, the initializer element (expression used to initialize the global variable arr) is not constant.
C requires initialization expressions to be compile-time constants, so that the contents of local variables can be placed in the data segment of the executable. This cannot be done for arr because the addresses of variables involved are not known until link time and their sum cannot be trivially filled in by the dynamic linker, as is the case for addr1. C++ allows this, so g++ generates initialization code that evaluates the non-constant expressions and stores them in global variables. This code is executed before invocation of main().
Executables cc1 and cc1plus are internal details of the implementation of the compiler, and as such irrelevant to the observed behavior. The relevant fact is that gcc expects valid C code as its input, and g++ expects valid C++ code. The code you provided is valid C++, but not valid C, which is why g++ compiles it and gcc doesn't.
There is a slightly more interesting question lurking here. Consider the following test cases:
#include <stdint.h>
#if TEST==1
void *p=(void *)(unsigned short)&p;
#elif TEST==2
void *p=(void *)(uintptr_t)&p;
#elif TEST==3
void *p=(void *)(1*(uintptr_t)&p);
#elif TEST==4
void *p=(void *)(2*(uintptr_t)&p);
#endif
gcc (even with the very conservative flags -ansi -pedantic-errors) rejects test 1 but accepts test 2, and accepts test 3 but rejects test 4.
From this I conclude that some operations that are easily optimized away (like casting to an object of the same size, or multiplying by 1) get eliminated before the check for whether the initializer is a constant expression.
So gcc might be accepting a few things that it should reject according to the C standard. But when you make them slightly more complicated (like adding the result of a cast to the result of another cast - what useful value can possibly result from adding two addresses anyway?) it notices the problem and rejects the expression.