Is it possible to see what is going on behind gcc and g++ compilation process?
I have the following program:
#include <stdio.h>
#include <unistd.h>
size_t sym1 = 100;
size_t *addr = &sym1;
size_t *arr = (size_t*)((size_t)&arr + (size_t)&addr);
int main (int argc, char **argv)
{
(void) argc;
(void) argv;
printf("libtest: addr of main(): %p\n", &main);
printf("libtest: addr of arr: %p\n", &arr);
while(1);
return 0;
}
Why is it possible to produce the binary without error with g++ while there is an error using gcc?
I'm looking for a method to trace what makes them behave differently.
# gcc test.c -o test_app
test.c:7:1: error: initializer element is not constant
# g++ test.c -o test_app
I think the reason can be in fact that gcc uses cc1 as a compiler and g++ uses cc1plus.
Is there a way to make more precise output of what actually has been done?
I've tried to use -v flag but the output is quite similar. Are there different flags passed to linker?
What is the easiest way to compare two compilation procedures and find the difference in them?
In this case, gcc produces nothing because your program is not valid C. As the compiler explains, the initializer element (expression used to initialize the global variable arr) is not constant.
C requires initialization expressions to be compile-time constants, so that the contents of local variables can be placed in the data segment of the executable. This cannot be done for arr because the addresses of variables involved are not known until link time and their sum cannot be trivially filled in by the dynamic linker, as is the case for addr1. C++ allows this, so g++ generates initialization code that evaluates the non-constant expressions and stores them in global variables. This code is executed before invocation of main().
Executables cc1 and cc1plus are internal details of the implementation of the compiler, and as such irrelevant to the observed behavior. The relevant fact is that gcc expects valid C code as its input, and g++ expects valid C++ code. The code you provided is valid C++, but not valid C, which is why g++ compiles it and gcc doesn't.
There is a slightly more interesting question lurking here. Consider the following test cases:
#include <stdint.h>
#if TEST==1
void *p=(void *)(unsigned short)&p;
#elif TEST==2
void *p=(void *)(uintptr_t)&p;
#elif TEST==3
void *p=(void *)(1*(uintptr_t)&p);
#elif TEST==4
void *p=(void *)(2*(uintptr_t)&p);
#endif
gcc (even with the very conservative flags -ansi -pedantic-errors) rejects test 1 but accepts test 2, and accepts test 3 but rejects test 4.
From this I conclude that some operations that are easily optimized away (like casting to an object of the same size, or multiplying by 1) get eliminated before the check for whether the initializer is a constant expression.
So gcc might be accepting a few things that it should reject according to the C standard. But when you make them slightly more complicated (like adding the result of a cast to the result of another cast - what useful value can possibly result from adding two addresses anyway?) it notices the problem and rejects the expression.
Related
Take the following minimum example:
#include <stdio.h>
bool test(){
for (int i = 0; i < 1024; i++)
{
printf("i=%d\n", i);
}
}
int main(){
test();
return 0;
}
where the return statement in the test function is missing. If I run the example like so:
g++ main.cpp -o main && ./main
Then the loop aborts after 1024 iterations. However, if I run the example with optimizations turned on:
g++ -O3 main.cpp -o main && ./main
Then this is optimized and I get an endless loop.
This behavior is consistent across g++ version 10.3.1 and clang++ version 10.0.1. The endless loop does not occur if I add a return statement or change the return type of the function to void.
I am curious: Is this something one would consider a compiler bug? Or is this acceptable, since a missing return statement is undefined behavior and thus we lose all guarantees about what happens in this function?
Your function is declared as bool test(), but your definition never returns anything. That means you've broken the contract with the language and have be put in a time out in undefined behavior land. There, all results are "correct".
You can think of undefined behavior as: It is not defined what output the compiler produces when asked to compile your code.
Actually the "undefined" refers to the observable behavior of the program the compiler creates from your code, but it boils down to the same.
This is not a compiler bug.
You asked the compiler to return a bool from a function without returning a bool from the function. There is simply no way the compiler can do that right, and that isn't the compilers fault.
For performance optimization, I would like to make use of the reference of a string rather than its value. Depending on the compilation options, I obtain different results. The behavior is a bit unclear to me, and I do not know the actual gcc flag that causes that difference.
My code is
#include <string>
#include <iostream>
const std::string* test2(const std::string& in) {
// Here I want to make use of the pointer &in
// ...
// it's returned only for demonstration purposes...
return ∈
}
int main() {
const std::string* t1 = test2("text");
const std::string* t2 = test2("text");
// only for demonstration, the cout is printed....
std::cout<<"References are: "<<(t1==t2?"equivalent. ":"different. ")<<t1<<"\t"<<t2<<std::endl;
return 0;
}
There are three compilation options:
gcc main.cc -o main -lstdc++ -O0 -fPIC && ./main
gcc main.cc -o main -lstdc++ -O2 -fno-PIC && ./main
gcc main.cc -o main -lstdc++ -O2 -fPIC && ./main
The first two yield equivalent results (References are: different.), so the pointers are different, but the third one results in equivalent pointers (References are: equivalent.).
Why does this happen, and which option do I have to add to the options -O2 -fPIC such that the pointers become again different?
Since this code is embedded into a larger framework, I cannot drop the options -O2 or -fPIC.
Since I get the desired result with the option -O2 and also with -fPIC, but a different behavior if both flags are used together, the exact behavior of these flags is unclear to me.
I tried with gcc4.8 and gcc8.3.
Both t1 and t2 are dangling pointers, they point to a temporary std::string which is already destroyed. The temporary std::string is constructed from the string literal during each call to test2("text") and lives until the end of the full-expression (the ;).
Their exact values depend on how the compiler (re-)uses stack space at a particular optimization level.
which option do I have to add to the options -O2 -fPIC such that the pointers become again different?
The code exhibits undefined behavior because it's illegal to compare invalid pointer values. Simply don't do this.
If we ignore the comparing part, then we end up with this version:
#include <string>
#include <iostream>
void test2(const std::string& in) {
std::cout << "Address of in: " << (void*)&in << std::endl;
}
int main() {
test2("text");
test2("text");
}
Now this code is free from UB, and it will print either the same address or different addresses, depending on how the compiler re-uses stack space between function calls. There is no way to control this, but it's no problem because keeping track of addresses of temporaries is a bad idea to begin with.
You can try using const char* as the input argument instead, then no temporary will be created in a call test2("text"). But here again, whether or not two instances of "text" point to the same location is implementation-defined. Though GCC does coalesce identical string literals, so at least in GCC you should observe the behavior you're after.
In C++, it is possible for pointer values to be compile-time constants. This is true, otherwise, non-type template parameters and constexpr won't work with pointers. However, as far as I know, addresses of functions and objects of static storage are known (at least) at link-time rather than compile-time. Following is an illustration:
main.cpp
#include <iostream>
template <int* p>
void f() { std::cout << p << '\n'; }
extern int a;
int main() {
f<&a>();
}
a.cpp
int a = 0;
I'm just wondering how the address of a could possibly be known when compiling main.cpp. I hope somebody could explain this a little to me.
In particular, consider this
template <int* p, int* pp>
constexpr std::size_t f() {
return (p + 1) == (pp + 7) ? 5 : 10;
}
int main() {
int arr[f<&a, &b>()] = {};
}
How should the storage for arr be allocated?
PLUS: This mechanism seems to be rather robust. Even when I enabled Randomized Base Address, the correct output is obtained.
The compiler doesn't need to know the value of &a at compile time any more than it needs the value of function addresses.
Think of it like this: the compiler will instantiate your function template with &a as a parameter and generate "object code" (in whatever format it uses to pass to the linker). The object code will look like (well it won't, but you get the idea):
func f__<funky_mangled_name_to_say_this_is_f_for_&a>__:
reg0 <- /* linker, pls put &std::cout here */
reg1 <- /* hey linker, stuff &a in there ok? */
call std::basic_stream::operator<<(int*) /* linker, fun addr please? */
[...]
If you instantiate f<b&>, assuming b is another global static, compiler does the same thing:
func f__<funky_mangled_name_to_say_this_is_f_for_&b>__:
reg0 <- /* linker, pls put &std::cout here */
reg1 <- /* hey linker, stuff &b in there ok? */
call std::basic_stream::operator<<(int*) /* linker, fun addr please? */
[...]
And when your code calls for calling either of those:
fun foo:
call f__<funky_mangled_name_to_say_this_is_f_for_&a>__
call f__<funky_mangled_name_to_say_this_is_f_for_&b>__
Which exact function to call is encoded in the mangled function name.
The generated code doesn't depend on the runtime value of &a or &b.
The compiler knows there will be such things at runtime (you told it so), that's all it needs. It'll let the linker fill in the blanks (or yell at you if you failed to deliver on your promise).
For your addition I'm afraid I'm not familiar enough about the constexpr rules, but the two compilers I have tell me that this function will be evaluated at runtime, which, according to them, makes the code non-conforming. (If they're wrong, then the answer above is, at least, incomplete.)
template <int* p, int* pp>
constexpr std::size_t f() {
return (p + 1) == (pp + 7) ? 5 : 10;
}
int main() {
int arr[f<&a, &b>()] = {};
}
clang 3.5 in C++14 standards conforming mode:
$ clang++ -std=c++14 -stdlib=libc++ t.cpp -pedantic
t.cpp:10:10: warning: variable length arrays are a C99 feature [-Wvla-extension]
int arr[f<&a, &b>()];
^
1 warning generated.
GCC g++ 5.1, same mode:
$ g++ -std=c++14 t.cpp -O3 -pedantic
t.cpp: In function 'int main()':
t.cpp:10:22: warning: ISO C++ forbids variable length array 'arr' [-Wvla]
int arr[f<&a, &b>()];
As far as I know, the variables of static storage and functions are stored simply as symbols/place holders in the symbol table while compiling. It is in the linking phase when the place holders are resolved.
The compiler outputs machine code keeping the placeholders intact. Then the linker replaces the placeholders of the variables / functions with their respective memory locations. So in this case too, if you just compile main.cpp without compiling a.cpp and linking with it, you are bound to face linker error, as you can see here http://codepad.org/QTdJCgle (I compiled main.cpp only)
Using gcc (4.7.2 here) I get warnings about unused auto variables, but not about other variables:
// cvars.h
#ifndef CVARS_H_
#define CVARS_H_
const auto const_auto = "const_auto";
const char const_char_array[] = "const_char_array";
const char * const_char_star = "const_char_star";
const char use_me = 'u';
#endif // CVARS_H_
//---
//comp_unit.cpp
#include "cvars.h"
void somef()
{
//const_auto // commented out - unused
use_me; // not using any of the others either
}
// compile with $ g++ -std=c++11 -Wunused-variable -c comp_unit.cpp
// gcc outputs warning: ‘cvars::const_auto’ defined but not used [-Wunused-variable]
// but does not complain about the other variables
Is this an inconsistency in GCC?
1.1 If so, what should happen in all cases, warning or no warning?
1.2 If not, what is the reason for the difference in behavior?
Note: Concerning 1.1, I imagine no warning should be printed in this case (this is what clang does). Otherwise, any compilation unit including a constant-defining header but not using all the constants within would contain lots of warnings.
These warnings are entirely up to the implementation, so there is no "should". But, yes, I agree: constants would ideally not generate these warnings even when declared using auto.
Since I can reproduce your observation in GCC 4.7 and GCC 4.8.0, but not in GCC 4.8.1 or 4.9, I'd say the guys over at GNU would agree too. In fact, I believe you're seeing bug 57183.
I've been reading again Scott Meyers' Effective C++ and more specifically Item 30 about inlining.
So I wrote the following, trying to induce that optimization with gcc 4.6.3
// test.h
class test {
public:
inline int max(int i) { return i > 5 ? 1 : -1; }
int foo(int);
private:
int d;
};
// test.cpp
int test::foo(int i) { return max(i); }
// main.cpp
#include "test.h"
int main(int argc, const char *argv[]) {
test t;
return t.foo(argc);
}
and produced the relevant assembly using alternatively the following:
g++ -S -I. test.cpp main.cpp
g++ -finline-functions -S -I. test.cpp main.cpp
Both commands produced the same assembly as far as the inline method is concerned;
I can see both the max() method body (also having a cmpl statement and the relevant jumps) and its call from foo().
Am I missing something terribly obvious? I can't say that I combed through the gcc man page, but couldn't find anything relevant standing out.
So, I just increased the optimization level to -O3 which has the inline optimizations on by default, according to:
g++ -c -Q -O3 --help=optimizers | grep inline
-finline-functions [enabled]
-finline-functions-called-once [enabled]
-finline-small-functions [enabled]
unfortunately, this optimized (as expected) the above code fragment almost out of existence.
max() is no longer there (at least as an explicitly tagged assembly block) and foo() has been reduced to:
_ZN4test3fooEi:
.LFB7:
.cfi_startproc
rep
ret
.cfi_endproc
which I cannot clearly understand at the moment (and is out of research scope).
Ideally, what I would like to see, would have been the assembly code for max() inside the foo() block.
Is there a way (either through cmd-line options or using a different (non-trivial?) code fragment) to produce such an output?
The compiler is entirely free to inline functiones even if you don't ask it to - both when you use inline keyword or not, or whether you use -finline-functions or not (although probably not if you use -fnoinline-functions - that would be contrary to what you asked for, and although the C++ standard doesn't say so, the flag becomes pretty pointless if it doesn't do something like what it says).
Next, the compiler is also not always certain that your function won't be used "somewhere else", so it will produce an out-of-line copy of most inline functions, unless it's entirely clear that it "can not possibly be called from somewhere else [for example the class is declared such that it can't be reached elsewhere].
And if you don't use the result of a function, and the function doesn't have side-effects (e.g. writing to a global variable, performing I/O or calling a function the compiler "doesn't know what it does"), then the compiler will eliminate that code as "dead" - because you don't really want unnecessary code, do you? Adding a return in front of max(i) in your foo function should help.