Why C++ compiler isn't optimizing unused reference variables? - c++

Consider following program:
#include <iostream>
struct Test
{
int& ref1;
int& ref2;
int& ref3;
};
int main()
{
std::cout<<sizeof(Test)<<'\n';
}
I know that C++ compiler can optimize the reference variables entirely so that they won't take any space in memory at all.
I tested a above demo program to see the output.
But when I compile & run on g++ 4.8.1 it gives me output 12.
It looks like compiler isn't optimizing the reference variables. I was expecting size of the Test struct to be 1.
I've also used -Os command line option but it still gives me output 12. I have also tried this program on MSVS 2010
compiled with /Ox command line option but it looks like Microsoft compiler isn't performing any optimization at all.
The three reference variables are unused & they aren't associated with any other variable. Then why compilers aren't optimizing them?

The size of the struct stays the same, there is nothing to optimize. If you would like to create an array of Test it should allocate the right size for each Test. The compiler cannot know which will be used or not. That's why there is no such optimization.
Unused variables would be for example a new int& int inside your main function. If this is unused, the optimizer will optimize it away.

Theoretically, if the world would only consist of simple programs, the compiler could optimize the sizeof of this struct to 1, because the sizeof of a struct is unspecified.
But in our real world, we have separate compilation of shared libraries that the compiler when compiling your code has no clue about (for example you could LoadLibrary or dlopen) that also happen to define your struct and where the sizeof should better agree with that in your program.
So actually a compiler better doesn't opimize the sizeof to 1 :)

In 8.3.2.4, of the C++ standard, it is said
It is unspecified whether or not a reference requires storage
So, the standard leaves it open to the implementation how references should be implemented. This implies that the size of your struct can be non-zero.
If the compiler would remove the references from the struct, you would not be able to link code compiled with different compiler settings. Imagine you compile one translation unit with optimizations, the other one without and link them together and pass an object from one TU to the other. Which size should the code assume? A function in TU 1 allocates 12 bytes on the stack, while a function in TU 2 allocates some other space.
The compiler can optimize your program and e.g. remove temporary objects, assignments etc. It may that you create an object of your struct somewhere in your source code and use it, but it will not be seen in the assembler code because it is not needed. What compilers also frequently do is remove indirections, e.g. by replacing references with direct access.

Related

Is it possible to calculate function length at compile time in C++?

I have this piece of code:
constexpr static VOID fStart()
{
auto a = 3;
a++;
}
__declspec(naked)
constexpr static VOID fEnd() {};
static constexpr auto getFSize()
{
return (SIZE_T)((PBYTE)fEnd - (PBYTE)fStart);
}
static constexpr auto fSize = getFSize();
static BYTE func[fSize];
Is it possible to declare "func[fSize]" array size as the size of "fStart()" during compilation without using any std library? It is necessary in order to copy the full code of fStart() into this array later.
There is no method in standard C++ to get the length of a function.
You'll need to use a compiler specific method.
One method is to have the linker create a segment, and place your function in that segment. Then use the length of the segment.
You may be able to use some assembly language constructs to do this; depends on the assembler and the assembly code.
Note: in embedded systems, there are reasons to move function code, such as to On-Chip memory or swap to external memory, or to perform a checksum on the code.
The following calculates the "byte size" of the fStart function. However, the size cannot be obtained as a constexpr this way, because casting loses the compile-time const'ness (see for example Why is reinterpret_cast not constexpr?), and the difference of two unrelated function pointers cannot be evaluated without some kind of casting.
#pragma runtime_checks("", off)
__declspec(code_seg("myFunc$a")) static void fStart()
{ auto a = 3; a++; }
__declspec(code_seg("myFunc$z")) static void fEnd(void)
{ }
#pragma runtime_checks("", restore)
constexpr auto pfnStart = fStart; // ok
constexpr auto pfnEnd = fEnd; // ok
// constexpr auto nStart = (INT_PTR)pfnStart; // error C2131
const auto fnSize = (INT_PTR)pfnEnd - (INT_PTR)pfnStart; // ok
// constexpr auto fnSize = (INT_PTR)pfnEnd - (INT_PTR)pfnStart; // error C2131
On some processors and with some known compilers and ABI conventions, you could do the opposite:
generate machine code at runtime.
For x86/64 on Linux, I know GNU lightning, asmjit, libgccjit doing so.
The elf(5) format knows the size of functions.
On Linux, you can generate shared libraries (perhaps generate C or C++ code at runtime (like RefPerSys does and GCC MELT did), then compiling it with gcc -fPIC -shared -O) and later dlopen(3) / dlsym(3) it. And dladdr(3) is very useful. You'll use function pointers.
Read also a book on linkers and loaders.
But you usually cannot move machine code without doing some relocation, unless that machine code is position-independent code (quite often PIC is slower to run than ordinary code).
A related topic is garbage collection of code (or even of agents). You need to read the garbage collection handbook and take inspiration from implementations like SBCL.
Remember also that a good optimizing C++ compiler is allowed to unroll loops, inline expand function calls, remove dead code, do function cloning, etc... So it may happen that machine code functions are not even contiguous: two C functions foo() and bar() could share dozens of common machine instructions.
Read the Dragon book, and study the source code of GCC (and consider extending it with your GCC plugin). Look also into the assembler code produced by gcc -O2 -Wall -fverbose-asm -S. Some experimental variants of GCC might be able to generate OpenCL code running on your GPGPU (and then, the very notion of function end does not make sense)
With generated plugins thru C and C++, you carefully could remove them using dlclose(3) if you use Ian Taylor's libbacktrace and dladdr to explore your call stack. In 99% of the cases, it is not worth the trouble, since in practice a Linux process (on current x86-64 laptops in 2021) can do perhaps a million of dlopen(3), as my manydl.c program demonstrates (it generates "random" C code, compile it into a unique /tmp/generated123.so, and dlopen that, and repeat many times).
The only reason (on desktop and server computers) to overwrite machine code is for long lasting server processes generating machine code every second. If this was your scenario, you should have mentioned it (and generating JVM bytecode by using Java classloaders could make more sense).
Of course, on 16 bits microcontrollers things are very different.
Is it possible to calculate function length at compile time in C++?
No, because at runtime time some functions do not exist anymore.
The compiler have somehow removed them. Or cloned them. Or inlined them.
And for C++ it is practically important with standard containers: a lot of template expansion occurs, including for useless code which has to be removed by your optimizing compiler at some point.
(Think -in 2021 of compilation with a recent GCC 10.2 or 11. using everywhere, and linking with, gcc -O3 -flto -fwhole-program: a function foo23 might be defined but never called, and then it is not inside the ELF executable)

Why aren't gcc-compiled programs sefaulting on dereferencing an uninitialized pointer in a Debug build? [duplicate]

I know that Visual Studio under debugging options will fill memory with a known value.
Does g++ (any version, but gcc 4.1.2 is most interesting) have any options that would
fill an uninitialized local POD structure with recognizable values?
struct something{ int a; int b; };
void foo() {
something uninitialized;
bar(uninitialized.b);
}
I expect uninitialized.b to be unpredictable randomness; clearly a bug and easily
found if optimization and warnings are turned on. But compiled with -g only, no
warning. A colleague had a case where code similar to this worked because it
coincidentally had a valid value; when the compiler upgraded, it started failing.
He thought it was because the new compiler was inserting known values into the structure
(much the way that VS fills 0xCC). In my own experience, it was just different
random values that didn't happen to be valid.
But now I'm curious -- is there any setting of g++ that would make it fill
memory that the standard would otherwise say should be uninitialized?
Any C++ comiler can initialize any POD type to its "zero" value using the syntax:
int i = int();
float f = float();
MyStruct mys = MyStruct();
// and generally:
T t = T();
If you want to talk about debugging that's something else...
(By the way, I think VS had all uninitialized memory initialized to 0xCC when in "debug mode, so that no matter where you jump (e.g. call a bad function pointer) that's doesn't happen to be actual program code/data int3 is raised.)
I don't think such option/feature exists in gcc/g++.
For instance, all global (and static) variables reside in the .bss section, which is always initialised to zeroes. However, uninitialised ones are put in a special section within the .bss, for sake of compatibility.
If you want the them to be zeroed too, you can pass -fno-common argument to the compiler. Or, if you need it on a per-variable basis, use __attribute__ ((nocommon)).
For heap, it's possible to write your own allocator to accomplish what you described. But for stack, I don't think there's an easy solution.
I don't believe g++ will detect all cases like this, but Valgrind certainly will.

not used const static variable in class optimized out?

Can a reasonable decent compiler discard this const static variable
class A{
const static int a = 3;
}
if it is nowhere used in the compiled binary or does it show up anyway in the binary?
Short answer: Maybe. The standard does not say the compiler HAS to keep the constants (or strings, or functions, or anything else), if it's never used.
Long answer: It very much depends on the circumstances. If the compiler can clearly determine that it is not used, it will remove unused constants. If it can't make that determination, it can not remove the unused constant, since the constant COULD be used by something that isn't currently known by the compiler (e.g. another source file).
For example, if class A is inside a function, the compiler can know this class is not used elsewhere, and if the constant isn't used in the function, then it's not used anywhere. If the class is in a "global" space, such that it could be used somewhere else, then it will need to keep the constant.
This gets even more interesting with "whole program optimization" or "link time optimization" (both from now on called LTO), where all the code is actually optimized as one large lump, and of course, "used" or "not used" can be determined for all possible uses.
As you can imagine, the result will also depend on how clever the compiler (and linker for the LTO) is. All compilers should follow the principle of "if in doubt keep it".
You can of course experiment, and write some code where you use the variable, and then remove the use, and see what difference it makes to the assembly code (e.g. g++ -S x.cpp or clang++ -S x.cpp, and look at the resulting x.s file).
When optimizations are disabled, the answer is compiler-dependent. But when optimizations are enabled, the end result is the same irrespective of the compiler. Let's assume that optimizations are enabled.
The compiler will not emit a definition for a const static field in the generated object file when both of the following conditions holds:
It can resolve all uses of the field with the constant value it was initialized to.
There is at most one source code file that has used the field (there is an exception which I'll discuss at the end).
I'll discuss the second condition later. But now let's focus on the first. Let's see an example. Suppose that the target platform is 32-bit and that we have defined the following type:
// In MyClassA.h
class MyClassA{
public:
const static int MyClassAField;
};
// In MyClassA.cpp (or in MyClassA.h if it was included in at most one cpp file)
const int MyClassA::MyClassAField = 2;
Most compilers consider int to be a 32-bit signed integer. Therefore, on a 32-bit processor, most instructions can handle a 32-bit constant. In this case, the compiler will be able to replace any uses of MyClassAField with the constant 2 and that field will not exist in the object file.
On the other hand, if the field was of type double, on a 32-bit platform, instructions cannot handle 64-bit values. In this case, most compilers emit the field in the object file and uses SSE instructions and registers to 64-bit value from memory and process them.
Now I'll explain the second condition. If there is more than one source code file that is using the field, it cannot be eliminated (irrespective of whether Whole Program Optimization (WPO) is enabled or not) because some object file has to include the definition of the field so that the linker can use it for other object files. However, the linker, if you specified the right switch, can eliminate the field from the generated binary.
This is a linker optimization enabled with /OPT:REF for VC++ and --gc-sections for gcc. For icc, the names of the switches are the same (/OPT:REF on Windows and --gc-sections on Linx and OSX). However, the compiler has to emit every function and static or global field in a separate section in the object file so that the linker can eliminate it.
There is a catch, however. If the field has been defined inline as follows:
class MyClassA{
public:
const static int MyClassAField = 2;
};
Then the compiler itself will eliminate the definition of this field from every object file that uses it. That's because every source code file that uses it includes a separate definition. Each of them is compiled separately, the compiler itself will optimize the field away using an optimization called constant propagation. In fact, the VC++ compiler perform this optimization even if optimizations are disabled.
When optimizations are disabled, whether a const static field will be eliminated or not depends on the compiler, but probably it will not be eliminated.

parameter dropping when using C linkage in a C++ program

I have the following code in main.cpp
extern "C"
{
void bar(int x, char* s);
}
int main()
{
bar(5, "hello");
}
Notice that function bar is declared as taking two arguments. This is then compiled and linked to a static library bar.cpp that contains this code
#include <iostream>
extern "C"
{
void bar(int x)
{
std::cout << x;
}
}
Notice function bar takes only one argument.
The executable compiles successfully and prints 5
I have three questions:
Shouldn't there have been a compiler error indicating the mismatch in the number of parameters?
In the scenario above, since the string hello is not received by bar, when and how is it destroyed?
Is it completely valid to write and use code as above (knowing parameters will be dropped)? What exactly are the semantics behind parameter dropping?
Although your specify VS, the question is more generally answered about C++, compilers, and execution platforms in general.
1.) You instructed the compiler to follow the C style calling convention for your platform to reference a symbol not defined in that compilation unit. The compiler then generates an object which tells the linker "call _bar here" (may be bar depending on your platform), which happily resolves to the compilation output of bar.cpp. The older C++ calling conventions would result in mangled names like barZ8intZP8char or worse depending on your compiler, to ensure overloading worked properly. However newer compilers are possibly smarter (magic!) and may understand additional metadata stored in the object file.
A bigger issue for multi-platform code is concerned with stack ordering. On a few platforms parameters are stored in reverse order on the stack than their declaration, thus your code would then be provided with the address of the string instead of the integer value. Fine for printing an integer, but would result in (hopefully) a segfault with the first parameter being a string and the second being an integer in the function declaration.
2.) This depends on the platform you are using. For most platforms related to the IA32 systems (x86, AMD64, IA64, etc), the caller is responsible for managing the parameters on the stack. Thus the stack frame containing the extra parameter is discarded in full when the call is complete. There are optimization cases where this may trigger a discrete bug where a frame is reused because the compiler was misinformed regarding the call stack.
3.) For application programming I would consider this a bad practice as it may introduce very hard to diagnose bugs. I'm sure some one has found an edge case to that statement regarding binary compatibility; however I would prefer the compiler to be aware of the parameters to avoid the optimization bugs referred to in #2.

Can g++ fill uninitialized POD variables with known values?

I know that Visual Studio under debugging options will fill memory with a known value.
Does g++ (any version, but gcc 4.1.2 is most interesting) have any options that would
fill an uninitialized local POD structure with recognizable values?
struct something{ int a; int b; };
void foo() {
something uninitialized;
bar(uninitialized.b);
}
I expect uninitialized.b to be unpredictable randomness; clearly a bug and easily
found if optimization and warnings are turned on. But compiled with -g only, no
warning. A colleague had a case where code similar to this worked because it
coincidentally had a valid value; when the compiler upgraded, it started failing.
He thought it was because the new compiler was inserting known values into the structure
(much the way that VS fills 0xCC). In my own experience, it was just different
random values that didn't happen to be valid.
But now I'm curious -- is there any setting of g++ that would make it fill
memory that the standard would otherwise say should be uninitialized?
Any C++ comiler can initialize any POD type to its "zero" value using the syntax:
int i = int();
float f = float();
MyStruct mys = MyStruct();
// and generally:
T t = T();
If you want to talk about debugging that's something else...
(By the way, I think VS had all uninitialized memory initialized to 0xCC when in "debug mode, so that no matter where you jump (e.g. call a bad function pointer) that's doesn't happen to be actual program code/data int3 is raised.)
I don't think such option/feature exists in gcc/g++.
For instance, all global (and static) variables reside in the .bss section, which is always initialised to zeroes. However, uninitialised ones are put in a special section within the .bss, for sake of compatibility.
If you want the them to be zeroed too, you can pass -fno-common argument to the compiler. Or, if you need it on a per-variable basis, use __attribute__ ((nocommon)).
For heap, it's possible to write your own allocator to accomplish what you described. But for stack, I don't think there's an easy solution.
I don't believe g++ will detect all cases like this, but Valgrind certainly will.