STL Static-Const Member Definitions - c++

How does the following work?
#include <limits>
int main()
{
const int* const foo = &std::numeric_limits<int> ::digits;
}
I was under the impression that in order to take an address of a static const-ant member we had to physically define it in some translation unit in order to please the linker. That said, after looking at the preprocessed code for this TU, I couldn't find an external definition for the digits member (or any other relevant members).
I tested this on two compilers (VC++ 10 and g++ 4.2.4) and got identical results from both (i.e., it works). Does the linker auto-magically link against an object file where this stuff is defined, or am I missing something obvious here?

Well, what makes you think that it is not defined? The very fact that your attempt to take the address succeeded automatically indicates that it is defined somewhere. It is not required to reside in your tranlation unit, of course, so looking through the preprocessor output doesn't make much sense.

Related

How throw a GCC error, if a global variable with the same name gets declared twice in a C/C++ file, but not static, extern, nor volatile?

I did run into the situation, that I declared two (separate) global variables with the same name in two separate files, but without using static, volatile nor extern on them.
One file was a .c and the other a .cpp file.
The compiler and build environment (GCC) was the ESP IDF and even the data types were different on these declarations.
mqtt_ssl.c:
esp_mqtt_client_handle_t client;
mb_master.cpp:
PL::ModbusClient client(port, PL::ModbusProtocol::rtu, 1);
During the runtime i experienced a lot of problems with reboots of the ESP32 until I found out, that the same name, of two actually separate variables, is causing the issue.
My guess is, that the compiler used the same memory-region for both of them.
I expected, that it gets handled as two separate objects with its own region in the memory, which is obviously not true.
After reading some questions here and there is my understanding now is that the behavior is undefined, if a variable with the same name gets declared without static, extern or volatile in two separate C/C++ files.
It took me quite a while to figure that out.
If it is not allowed to declare it like this, why didn't the compiler/linker throw an error?
Is there an option for GCC to treat such a situation as error to prevent that situation in the future?
Edit 1:
This is a reproducible example with xtensa-esp32-elf-gcc.exe (crosstool-NG esp-2021r2-patch3) 8.4.0.
app_main.c
#include <stdio.h>
void test_sub(void);
uint16_t test1;
void app_main(void)
{
test1 = 1;
printf("app_main:test1=%d\n", test1);
test_sub();
printf("app_main:test1=%d\n", test1);
}
test_sub.c
#include <stdio.h>
int16_t test1;
void test_sub(void)
{
test1 = 2;
printf("test_sub:test1=%d\n", test1);
}
Result:
app_main:test1=1
test_sub:test1=2
app_main:test1=2
test1 in app_main() got overwritten by test_sub(), because it had the same name in both files.
esp_mqtt_client_handle_t client; is a tentative definition. In spite of its name, it is not a definition, just a declaration, but it will cause a definition to be created at the end of the translation unit if there is no regular definition in the translation unit.
The C standard allows C implementations to choose how they resolve multiple definitions (because it says it does not define the behavior, so compilers and linkers may define it). Prior to GCC version 10, the default in GCC was to mark definitions from tentative definitions as “common,” meaning they could be coalesced with other definitions. This results in the linker not complaining about such multiple definitions.
Giving the -fno-common switch to GCC instructs it not to do this; definitions will not be marked as “common,” and the linker will complain if it sees multiple definitions.
Is your problem really that the variable is shaddowed? Often with global variables the problem can be that the variables are not initialized in the right order.

Why does the one definition rule exist in C/C++

In C and C++, you can't have a function with two definitions. For example, say we have the following two files:
1.c:
int main(){ return 0}
2.c:
int main(){ return 0}
Issuing the command gcc 1.c 2.c will give you a duplicate symbol linker error.
Why doesn't the same happen with structs and classes? Why are we allowed to have multiple
definitions of the same struct as long as they have the same tokens?
To answer this question, one has to delve into compilation process and what is needed in each part (question why these steps are perfomed is more historical, going back to beginning of C before it's standardization)
C and C++ programs are compiled in multiple steps:
Preprocessing
Compilation
Linkage
Preprocessing is everything that starts with #, it's not really important here.
Compilation is performed on each and every translation unit (typically a single .c or .cpp file plus the headers it includes). Compiler takes one translation unit at a time, reads it and produces an internal list of classes and their members, and then assembly code of each function in given unit (basing on the structures list). If a function call is not inlined (e.g. it is defined in different TU), compiler produces a "link" - "please insert function X here" for the linker to read.
Then linker takes all of the compiled translation units and merges them into one binary, substituting all the links specified by compiler.
Now, what is needed at each phase?
For compilation phase, you need the
definition of every class used in this file - compiler needs to know the size and offset of each class member to produce assembly
declaration of every function used in this file - to produce those "links".
Since function definitions are not needed for producing assembly (as long as they are compiled somewhere), they are not needed in compilation phase, only in linking phase.
To sum up:
One Definition Rule is there to protect programmers from theselves. If they'd accidentally define a function twice, linker will notice that and executable is not produced.
However, class definitions are required in every translation unit, and therefore such a rule cannot be set up for them. Since it cannot be forced by language, programmers have to be responsible beings and not define the same class in different ways.
ODR has also other limitations, e.g. you have to define template functions (or template class methods) in header files. You can also take the responsibility and say to the compiler "Every definition of this function will be the same, trust me dude" and make the function inline.
There is no use case for a function with 2 definitions. Either the two definitions would have to be the same, making it useless, or the compiler wouldn't be able to tell which one you meant.
This is not the case with classes or structures. There is also a large advantage to allowing multiple definitions of them, i.e. if we want to use a class or struct in multiple files. (This leads indirectly to multiple definitions because of includes.)
Structures, classes, unions and enumerations define types that can be used in several compilation units to define objects of these types. So each compilation unit need to know how the types are defined, for example to allocate correctly memory for an object or to be sure that specified member of a class does indeed exist.
For functions (if they are not inline functions) it is enough to have their declaration without their definition to generate for example a function call.
But the function definition shall be single. Otherwise the compiler will not know what function to call or the object code will be too big due to duplication and will be error prone..
It's quite simple: It's a question of scope. Non-static functions are seen (callable) by every compilation unit linked together, while structures are only seen in the compilation unit where they are defined.
For example, it's valid to link the following together because it's clear which definition of struct Foo and which definition of f is being used:
1.c:
struct Foo { int x; };
static void f(void) { struct Foo foo; ... }
2.c:
struct Foo { double d; };
static void f(void) { struct Foo foo; ... }
int main(void) { ... }
But it isn't valid to link the following together because the linker wouldn't know which f to call.
1.c:
void f(void) { ... }
2.c:
void f(void) { ... }
int main(void) { f(); }
Actually every programming element is associated with a scope of its applicability. And within this scope you cannot have the same name associated with multiple definitions of an element. In compiled world:
You cannot have more than one class definition with the same name within a single file. But you can have it in different compilation units.
You cannot have the same function or global variable name within a single link unit (library or executable), but you can potentially have functions named the same within different libraries.
you cannot have shared libraries with the same name situated in the same directory, but you can have them in different directories.
C/C++ compilation is very much after the compilation performance. Checking 2 objects like function or classes for identity is a time-consuming task. So, it is not done. Only names are considered for comparison. It is better to consider that 2 types are different and error out then checking them for identity. The only exception from this rule are text macros.
Macros are a pre-processor concept and historically it is allowed to have multiple identical macro definitions. If a definition changes, a warning gets generated. Comparing macro context is easy, just a simple string comparison, but some macro definitions could be huge.
Types are the compiler concept and they are resolved by the compiler. Types do not exist in object libraries and are represented by the sizes of corresponding variables. So, there is no reason for checking type name collisions at this scope.
Functions and variables on the other hand are named pointers to executable codes or data. They are the building blocks of applications. Applications are assembled from the codes and libraries coming from all around the world in some cases. In order to use someone else's function you'd better now its name and you do not want the same name to be used by some one else. Within a shared library names of functions and variables are usually stored in a hash table. There is no place for duplicates there.
And as I already mention checking functions for identical contents is seldom done, however there are some cases, but not in c or c++.
The reason of impeding two different definitions for the same thing to be used in programming is to avoid the ambiguity of deciding which definition to use at run time.
If you have two different implementations to the same thing to coexist in a program, then there's the possibility of aliasing them (with a different name each) into a common reference to decide at runtime which one of the two to use.
Anyway, in order to distinguish both, you have to be able to indicate the compiler which one you want to use. In C++ you can overload a function, giving it the same name and different lists of parameters, so you can distinguish which one of both you want to use. But in C, the compilers only preserve the name of the function to be able to solve at link time which definition matches the name you use in a different compilation unit. In case the linker ends with two different definitions with the same name, it is uncapable of deciding for you which one to use, so it emits an error and gives up the building process.
What should be the intention of using this ambiguity in a productive way? this is the question you have actually to ask to yourself.

constexpr array in header

I have the following code in a header file, that is included in 2 different cpp files:
constexpr int array[] = { 11, 12, 13, 14, 15 };
inline const int* find(int id)
{
auto it = std::find(std::begin(array), std::end(array), id);
return it != std::end(array) ? &*it : nullptr;
}
I then call find(13) in each of the cpp files. Will both pointers returned by find() point to the same address in memory?
The reason I ask is because I have similar code in my project and sometimes it works and sometimes it doesn't. I assumed both pointers would point to the same location, but I don't really have a basis for that assumption :)
In C++11 and C++14:
In your example array has internal linkage (see [basic.link]/3.2), which means it will have different addresses in different translation units.
And so it's an ODR violation to include and call find in different translation units (since its definition is different).
A simple solution is to declare it extern.
In C++17:
[basic.link]/3.2 has changed such that constexpr can be inline in which case there will be no effect on the linkage anymore.
Which means that if you declare array inline it'll have external linkage and will have the same address across translation units. Of course like with any inline, it must have identical definition in all translation units.
I can't claim to be an expert in this, but according to this blog post, the code you have there should do what you want in C++17, because constexpr then implies inline and that page says (and I believe it):
A variable declared inline has the same semantics as a function declared inline: it can be defined, identically, in multiple translation units, must be defined in every translation unit in which it is used, and the behavior of the program is as if there was exactly one variable.
So, two things to do:
make sure you are compiling as C++17
declare array as constexpr inline to force a compiler error on older compilers (and to ensure that you actually get the semantics you want - see comments below)
I believe that will do it.

Why does defining inline global function in 2 different cpp files cause a magic result?

Suppose I have two .cpp files file1.cpp and file2.cpp:
// file1.cpp
#include <iostream>
inline void foo()
{
std::cout << "f1\n";
}
void f1()
{
foo();
}
and
// file2.cpp
#include <iostream>
inline void foo()
{
std::cout << "f2\n";
}
void f2()
{
foo();
}
And in main.cpp I have forward declared the f1() and f2():
void f1();
void f2();
int main()
{
f1();
f2();
}
Result (doesn't depend on build, same result for debug/release builds):
f1
f1
Whoa: Compiler somehow picks only the definition from file1.cpp and uses it also in f2(). What is the exact explanation of this behavior?.
Note, that changing inline to static is a solution for this problem. Putting the inline definition inside an unnamed namespace also solves the problem and the program prints:
f1
f2
This is undefined behavior, because the two definitions of the same inline function with external linkage break C++ requirement for objects that can be defined in several places, known as One Definition Rule:
3.2 One definition rule
...
There can be more than one definition of a class type (Clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (Clause 14),[...] in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. Given such an entity named D defined in more than one translation unit, then
6.1 each definition of D shall consist of the same sequence of tokens; [...]
This is not an issue with static functions, because one definition rule does not apply to them: C++ considers static functions defined in different translation units to be independent of each other.
The compiler may assume that all definitions of the same inline function are identical across all translation units because the standard says so. So it can choose any definition it wants. In your case, that happened to be the one with f1.
Note that you cannot rely on the compiler always picking the same definition, violating the aforementioned rule makes the program ill-formed. The compiler could also diagnose that and error out.
If the function is static or in an anonymous namespace, you have two distinct functions called foo and the compiler must pick the one from the right file.
Relevant standardese for reference:
An inline function shall be defined in every translation unit in which it is odr-used and shall have exactly
the same definition in every case (3.2). [...]
7.1.2/4 in N4141, emphasize mine.
As others have noted, the compilers are in compliance with the C++ standard because the One definition rule states that you shall have only one definition of a function, except if the function is inline then the definitions must be the same.
In practice, what happens is that the function is flagged as inline, and at linking stage if it runs into multiple definitions of an inline flagged token, the linker silently discards all but one. If it runs into multiple definitions of a token not flagged inline, it instead generates an error.
This property is called inline because, prior to LTO (link time optimization), taking the body of a function and "inlining" it at the call site required that the compiler have the body of the function. inline functions could be put in header files, and each cpp file could see the body and "inline" the code into the call site.
It doesn't mean that the code is actually going to be inlined; rather, it makes it easier for compilers to inline it.
However, I am unaware of a compiler that checks that the definitions are identical before discarding duplicates. This includes compilers that otherwise check definitions of function bodies for being identical, such as MSVC's COMDAT folding. This makes me sad, because it is a reall subtle set of bugs.
The proper way around your problem is to place the function in an anonymous namespace. In general, you should consider putting everything in a source file in an anonymous namespace.
Another really nasty example of this:
// A.cpp
struct Helper {
std::vector<int> foo;
Helper() {
foo.reserve(100);
}
};
// B.cpp
struct Helper {
double x, y;
Helper():x(0),y(0) {}
};
methods defined in the body of a class are implicitly inline. The ODR rule applies. Here we have two different Helper::Helper(), both inline, and they differ.
The sizes of the two classes differ. In one case, we initialize two sizeof(double) with 0 (as the zero float is zero bytes in most situations).
In another, we first initialize three sizeof(void*) with zero, then call .reserve(100) on those bytes interpreting them as a vector.
At link time, one of these two implementations is discarded and used by the other. What more, which one is discarded is likely to be pretty determistic in a full build. In a partial build, it could change order.
So now you have code that might build and work "fine" in a full build, but a partial build causes memory corruption. And changing the order of files in makefiles could cause memory corruption, or even changing the order lib files are linked, or upgrading your compiler, etc.
If both cpp files had a namespace {} block containing everything except the stuff you are exporting (which can use fully qualified namespace names), this could not happen.
I've caught exactly this bug in production multiple times. Given how subtle it is, I do not know how many times it slipped through, waiting for its moment to pounce.
POINT OF CLARIFICATION:
Although the answer rooted in C++ inline rule is correct, it only applies if both sources are compiled together. If they are compiled separately, then, as one commentator noted, each resulting object file would contain its own 'foo()'. HOWEVER: If these two object files are then linked together, then because both 'foo()'-s are non-static, the name 'foo()' appears in the exported symbol table of both object files; then the linker has to coalesce the two table entries, hence all internal calls are re-bound to one of the two routines (presumably the one in the first object file processed, since it is already bound [i.e the linker would treat the second record as 'extern' regardless of binding]).

Why do functions need to be declared before they are used?

When reading through some answers to this question, I started wondering why the compiler actually does need to know about a function when it first encounters it. Wouldn't it be simple to just add an extra pass when parsing a compilation unit that collects all symbols declared within, so that the order in which they are declared and used does not matter anymore?
One could argue, that declaring functions before they are used certainly is good style, but I am wondering, is there are any other reason why this is mandatory in C++?
Edit - An example to illustrate: Suppose you have to functions that are defined inline in a header file. These two function call each other (maybe a recursive tree traversal, where odd and even layers of the tree are handled differently). The only way to resolve this would be to make a forward declaration of one of the functions before the other.
A more common example (though with classes, not functions) is the case of classes with private constructors and factories. The factory needs to know the class in order to create instances of it, and the class needs to know the factory for the friend declaration.
If this is requirement is from the olden days, why was it not removed at some point? It would not break existing code, would it?
How do you propose to resolve undeclared identifiers that are defined in a different translation unit?
C++ has no module concept, but has separate translation as an inheritance from C. A C++ compiler will compile each translation unit by itself, not knowing anything about other translation units at all. (Except that export broke this, which is probably why it, sadly, never took off.)
Header files, which is where you usually put declarations of identifiers which are defined in other translation units, actually are just a very clumsy way of slipping the same declarations into different translation units. They will not make the compiler aware of there being other translation units with identifiers being defined in them.
Edit re your additional examples:
With all the textual inclusion instead of a proper module concept, compilation already takes agonizingly long for C++, so requiring another compilation pass (where compilation already is split into several passes, not all of which can be optimized and merged, IIRC) would worsen an already bad problem. And changing this would probably alter overload resolution in some scenarios and thus break existing code.
Note that C++ does require an additional pass for parsing class definitions, since member functions defined inline in the class definition are parsed as if they were defined right behind the class definition. However, this was decided when C with Classes was thought up, so there was no existing code base to break.
Historically C89 let you do this. The first time the compiler saw a use of a function and it didn't have a predefined prototype, it "created" a prototype that matched the use of the function.
When C++ decided to add strict typechecking to the compiler, it was decided that prototypes were now required. Also, C++ inherited the single-pass compilation from C, so it couldn't add a second pass to resolved all symbols.
Because C and C++ are old languages. Early compilers didn't have a lot of memory, so these languages were designed so a compiler can just read the file from top to bottom, without having to consider the file as a whole.
I think of two reasons:
It makes the parsing easy. No extra pass needed.
It also defines scope; symbols/names are available only after its declaration. Means, if I declare a global variable int g_count;, the later code after this line can use it, but not the code before the line! Same argument for global functions.
As an example, consider this code:
void g(double)
{
cout << "void g(double)" << endl;
}
void f()
{
g(int());//this calls g(double) - because that is what is visible here
}
void g(int)
{
cout << "void g(int)" << endl;
}
int main()
{
f();
g(int());//calls g(int) - because that is what is the best match!
}
Output:
void g(double)
void g(int)
See the output at ideone : http://www.ideone.com/EsK4A
The main reason will be to make the compilation process as efficient as possible. If you add an extra pass you're adding both time and storage. Remember that C++ was developed back before the time of Quad Core Processors :)
The C programming language was designed so that the compiler could be implemented as a one-pass compiler. In such a compiler, each compilation phase is only executed once. In such a compiler you cannot referrer to an entity that is defined later in the source file.
Moreover, in C, the compiler only interpret a single compilation unit (generally a .c file and all the included .h files) at a time. So you needed a mechanism to referrer to a function defined in another compilation unit.
The decision to allow one-pass compiler and to be able to split a project in small compilation unit was taken because at the time the memory and the processing power available was really tight. And allowing forward-declaration could easily solve the issue with a single feature.
The C++ language was derived from C and inherited the feature from it (as it wanted to be as compatible with C as possible to ease the transition).
I guess because C is quite old and at the time C was designed efficient compilation was a problem because CPUs were much slower.
Since C++ is a static language, the compiler needs to check if values' type is compatible with the type expected in the function's parameters. Of course, if you don't know the function signature, you can't do this kind of checks, thus defying the purpose of a static compiler. But, since you have a silver badge in C++, I think you already know this.
The C++ language specs were made right because the designer didn't want to force a multi-pass compiler, when hardware was not as fast as the one available today. In the end, I think that, if C++ was designed today, this imposition would go away but then, we would have another language :-).
One of the biggest reasons why this was made mandatory even in C99 (compared to C89, where you could have implicitly-declared functions) is that implicit declarations are very error-prone. Consider the following code:
First file:
#include <stdio.h>
void doSomething(double x, double y)
{
printf("%g %g\n",x,y);
}
Second file:
int main()
{
doSomething(12345,67890);
return 0;
}
This program is a syntactically valid* C89 program. You can compile it with GCC using this command (assuming the source files are named test.c and test0.c):
gcc -std=c89 -pedantic-errors test.c test0.c -o test
Why does it print something strange (at least on linux-x86 and linux-amd64)? Can you spot the problem in the code at a glance? Now try replacing c89 with c99 in the command line — and you'll be immediately notified about your mistake by the compiler.
Same with C++. But in C++ there're actually other important reasons why function declarations are needed, they are discussed in other answers.
* But has undefined behavior
Still, you can have a use of a function before it is declared sometimes (to be strict in the wording: "before" is about the order in which the program source is read) -- inside a class!:
class A {
public:
static void foo(void) {
bar();
}
private:
static void bar(void) {
return;
}
};
int main() {
A::foo();
return 0;
}
(Changing the class to a namespace doesn't work, per my tests.)
That's probably because the compiler actually puts the member-function definitions from inside the class right after the class declaration, as someone has pointed it out here in the answers.
The same approach could be applied to the whole source file: first, drop everything but declaration, then handle everything postponed. (Either a two-pass compiler, or large enough memory to hold the postponed source code.)
Haha! So, they thought a whole source file would be too large to hold in the memory, but a single class with function definitions wouldn't: they can allow for a whole class to sit in the memory and wait until the declaration is filtered out (or do a 2nd pass for the source code of classes)!
I remember with Unix and Linux, you have Global and Local. Within your own environment local works for functions, but does not work for Global(system). You must declare the function Global.