I have the following code in a header file, that is included in 2 different cpp files:
constexpr int array[] = { 11, 12, 13, 14, 15 };
inline const int* find(int id)
{
auto it = std::find(std::begin(array), std::end(array), id);
return it != std::end(array) ? &*it : nullptr;
}
I then call find(13) in each of the cpp files. Will both pointers returned by find() point to the same address in memory?
The reason I ask is because I have similar code in my project and sometimes it works and sometimes it doesn't. I assumed both pointers would point to the same location, but I don't really have a basis for that assumption :)
In C++11 and C++14:
In your example array has internal linkage (see [basic.link]/3.2), which means it will have different addresses in different translation units.
And so it's an ODR violation to include and call find in different translation units (since its definition is different).
A simple solution is to declare it extern.
In C++17:
[basic.link]/3.2 has changed such that constexpr can be inline in which case there will be no effect on the linkage anymore.
Which means that if you declare array inline it'll have external linkage and will have the same address across translation units. Of course like with any inline, it must have identical definition in all translation units.
I can't claim to be an expert in this, but according to this blog post, the code you have there should do what you want in C++17, because constexpr then implies inline and that page says (and I believe it):
A variable declared inline has the same semantics as a function declared inline: it can be defined, identically, in multiple translation units, must be defined in every translation unit in which it is used, and the behavior of the program is as if there was exactly one variable.
So, two things to do:
make sure you are compiling as C++17
declare array as constexpr inline to force a compiler error on older compilers (and to ensure that you actually get the semantics you want - see comments below)
I believe that will do it.
Related
Let's say I have something like this in a library header:
// ExampleLib.h
#ifndef DEFAULT_OPTION_VALUE
# define DEFAULT_OPTION_VALUE 1
#endif
class Example {
public:
void doSomething ();
private:
static bool getDefaultOptionValue () {
return (0 != DEFAULT_OPTION_VALUE);
}
};
And something like this in a library source file:
#include <ExampleLib.h>
void Example::doSomething () {
...
if (getDefaultOptionValue()) {
...
} else {
...
}
...
}
Now, let's say when an application that links to this library is built, it #defines its own value for DEFAULT_OPTION_VALUE via the compiler command line or whatever (which may differ from the value set when compiling the library), with the intended effect being to allow the application to determine the behavior of doSomething() at compile time.
My main question is: Is this even valid (and would it behave as intended), or is it undefined behavior?
Then, if it is well-defined, is there any risk that a compiler, when compiling the library, would optimize away the effect of changing the #define, such as:
... seeing that Example::getDefaultOptionValue() returns a constant, and therefore optimizing away the call to getDefaultOptionValue() from the implementation of doSomething(), and/or
... inlining Example::getDefaultOptionValue() when compiling doSomething()
... etc.
And, as a consequence, the behavior wouldn't reflect the value of DEFAULT_OPTION_VALUE set when the application itself is built?
Not sure if it matters, but assume C++14 or later.
[basic.def.odr]/6 There can be more than one definition of a class type ... in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. Given such an entity named D defined in more than one translation unit, then
(6.1) — each definition of D shall consist of the same sequence of tokens...
...
If the definitions of D do not satisfy these requirements, then the behavior is undefined.
If DEFAULT_OPTION_VALUE expands to a different sequence of tokens in two translation units that include the definition of Example, then the program that combines these translation units would exhibit undefined behavior.
Being a textual substitution, there is nothing maintaining the link to the name DEFAULT_OPTION_VALUE. The preprocessor changes the source file to read:
return (0 != 1);
and when the library is compiled, it sees that definition.
When the same header file is used in the main project with a different definition for DEFAULT_OPTION_VALUE, it compiles to a different function body in that translation unit.
If this is linked using plain libraries (a collection of object files), it is clearly undefined behavior. I do expect that compilers will both inline the function in each place it is used, so it can't do linker-choose-one for the whole program. I also expect it to indeed optimize out the statements because of constant values known at compile time.
Is this undefined behavior
As per the quoted rule in Igor's answer, this is an ODR violation. That violation makes the program ill-formed (no diagnostic required). This is effectively the same as having UB.
and, if so, can I fix it?
You could use a non-member function which allows you to declare the function with internal linkage. If you do that, then each translation unit will have an identical class declaration and each will declare a separate function, and thus the declarations being different would not be an issue. Confusingly, the keyword to declare a function with internal linkage is the same as the one used for static member functions. You just declare it outside the class:
namespace detail {
static bool getDefaultOptionValue () {
return (0 != DEFAULT_OPTION_VALUE);
}
}
class Example;
Since your member function was private, I've put the function in a namespace detail, which is a conventional way to express "This is private, here be dragons".
I would like to make an array of integers via the malloc method. I want this array to be global and be used anywhere in my program. I put code in a header file that looked like this:
static int *pieces;
Then I have a function that fills it with numbers that I want in there. The function is in a namespace and the namespace is implemented in its own .cpp file. However, I import the header file into main.c and call the function from the namespace that creates the array like:
pieces = malloc(sizeof(int) * 128);
But when I try to access numbers in the array in main (after calling the function that creates my array), it crashes and says that pieces wasn't initialized. But in the function I have I can create it and manipulate the numbers in it just fine. I was under the impression that by making pieces a static variable, whenever some function anywhere changes (or sets it) then that will affect the usage of the variable anywhere. Basically what I'm trying to say is why does pieces appear unset in main, even though I set it in a function that I called?
Static is a keyword with many meanings, and in this particular case, it means not global (paraphrasing)
It means that each .cpp file has its own copy of the variable. Thus, when you initialize in main.cpp, it is initialized ONLY in main.cpp. The other files have it still uninitialized.
First thing to fix this would be to remove the keyword static. That would cause the "Multiple definitions issue". To fix this you should define the variable in a .cpp file and just extern declare it in a header file.
Edit: You are just allocating memory to it, doesnt count as initialization. You need to initialize the memory to 0 after allocation.
You can use new int[128]() instead of your more verbose malloc syntax, and this would perform initialization as well? Or you could take the easy road (thats what its there for) and use std::vector
The key is this:
static int *pieces;
You said you put that in your header. This is not the way to export a symbol. Any file that includes the header will get its own static version of an uninitialised pointer called pieces.
Instead, you put this in your header:
extern int *pieces;
extern int init_pieces();
And in the source file, you do this:
static const size_t num_pieces = 128;
int *pieces = 0;
int init_pieces()
{
pieces = malloc( num_pieces * sizeof(int) );
return pieces != NULL;
}
Now when you include your header, your source file will know to get pieces from somewhere else, and will wait for the linker to work out where. I also suggested an 'init' function for the array. I did not put a 'release' function in, however.
Note this is all C, not C++. If you're using C++ you should really use new or better still, use a vector.
Also, when using statics in C++, be mindful of this: C++ static initialization order
In C++17 standard, you can use inline specifier instead of static. For variables this means every object unit will have a copy of the variable, but linker will choose only one of them.
Or, as stated on cppreference:
An inline function or inline variable (since C++17) has the following
properties:
1) There may be more than one definition of an inline
function or variable (since C++17) in the program as long as each
definition appears in a different translation unit and (for non-static
inline functions and variables (since C++17)) all definitions are
identical. For example, an inline function or an inline variable
(since C++17) may be defined in a header file that is #include'd in
multiple source files.
2) The definition of an inline function or
variable (since C++17) must be present in the translation unit where
it is accessed (not necessarily before the point of access).
3) An inline function or variable (since C++17) with external linkage (e.g.
not declared static) has the following additional properties:
1) It must be declared inline in every translation unit.
2) It has the same address in every translation unit.
Supported in (source):
MSVC since version 19.12 (VS 2017 15.5)
GCC 7
Clang 3.9
ICC 18.0
In this case, it means you can replace
static int *pieces;
with
inline int *pieces;
For high performance code on various architectures, you may want a malloc-y allocation rather than generic new. That is because you would wrap it with something like mymalloc() and then use architecture dependent functions, such as ones that implement the proper alignment to avoid cache misses and do other nifty things provided by the hardware manufacturer, such as IBM (Bluegene) or Intel (MIC). All of these optimized allocation routines have the malloc type framework.
I see a common pattern in many C++ codebases:
Header.h:
static const int myConstant = 1;
Source1.cpp:
#include "Header.h"
Source2.cpp:
#include "Header.h"
Based on:
3.5 Program and linkage
...
(2.1) — When a name has external linkage , the entity it denotes can be referred to by names from scopes of
other translation units or from other scopes of the same translation unit.
(2.2) — When a name has internal linkage , the entity it denotes can be referred to by names from other scopes
in the same translation unit.
...
3 A name having namespace scope (3.3.6) has internal linkage if it is the name of
(3.1) — a variable, function or function template that is explicitly declared static; or,
myConstant is accessible only from the same translation unit and the compiler will generate multiple instances of it, one for each translation unit that included Header.h.
Is my understanding correct - multiple instances of myConstant are created? If this is the case can you please point me to better alternatives of using constants in C++
EDIT:
Some suggested to make myConstant extern in the header and define it in one cpp file. Is this a good practice? I guess this will make the value invisible to the compiler and prevent many optimizations, for example when the value appears in arithmetic operations.
What you're doing should be fine. The optimizer will probably avoid creating any storage for the constants, and will instead replace any uses of it with the value, as long as you never take the address of the variable (e.g. &myConstant).
A pattern static const int myConstant = 1 arising in header files is a little bit strange, because keyword static restricts the scope of a variable definition to the specific translation unit. Hence, this variable can then not be accessed from other translation units. So I don't see why someone might expose a variable in a header file though this variable can never be addressed from "outside".
Note that if different translation units include the header, then each translation unit will define its own, somewhat "private" instance of this variable.
I think that the common pattern should be:
In the header file:
extern const int myConstant;
In exactly one implementation file of the whole program:
const int myConstant = 1;
The comments say, however, that this will prevent the compiler from optimisations, as the value of the constant is not know at the time a translation unit is compiled (and this sounds reasonable).
So it seems that "global/shared" constants are not possible and that one might have to live with the - somewhat contradicting - keyword static in a header file.
Additionally, I'd use constexr to indicate a compile time constant (though the compiler might derive this anyway):
static constexpr int x = 1;
Because the static-keyword still disturbs me somehow, I did some research and experiments on constexpr without a static keyword but with an extern keyword. Unfortunately, an extern constexpr still requires an initialisation (which makes it a definition then and leads to duplicate symbol errors). Interestingly , at least with my compiler, I can actually define constexpr int x = 1 in different translation units without introducing a compiler/linker error. But I do not find a support for this behaviour in the standard. But defining constexpr int x = 1 in a header file is even more curious than static constexpr int x = 1.
So - many words, few findings. I think static constexpr int x = 1 is the best choice.
Suppose I have two .cpp files file1.cpp and file2.cpp:
// file1.cpp
#include <iostream>
inline void foo()
{
std::cout << "f1\n";
}
void f1()
{
foo();
}
and
// file2.cpp
#include <iostream>
inline void foo()
{
std::cout << "f2\n";
}
void f2()
{
foo();
}
And in main.cpp I have forward declared the f1() and f2():
void f1();
void f2();
int main()
{
f1();
f2();
}
Result (doesn't depend on build, same result for debug/release builds):
f1
f1
Whoa: Compiler somehow picks only the definition from file1.cpp and uses it also in f2(). What is the exact explanation of this behavior?.
Note, that changing inline to static is a solution for this problem. Putting the inline definition inside an unnamed namespace also solves the problem and the program prints:
f1
f2
This is undefined behavior, because the two definitions of the same inline function with external linkage break C++ requirement for objects that can be defined in several places, known as One Definition Rule:
3.2 One definition rule
...
There can be more than one definition of a class type (Clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (Clause 14),[...] in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. Given such an entity named D defined in more than one translation unit, then
6.1 each definition of D shall consist of the same sequence of tokens; [...]
This is not an issue with static functions, because one definition rule does not apply to them: C++ considers static functions defined in different translation units to be independent of each other.
The compiler may assume that all definitions of the same inline function are identical across all translation units because the standard says so. So it can choose any definition it wants. In your case, that happened to be the one with f1.
Note that you cannot rely on the compiler always picking the same definition, violating the aforementioned rule makes the program ill-formed. The compiler could also diagnose that and error out.
If the function is static or in an anonymous namespace, you have two distinct functions called foo and the compiler must pick the one from the right file.
Relevant standardese for reference:
An inline function shall be defined in every translation unit in which it is odr-used and shall have exactly
the same definition in every case (3.2). [...]
7.1.2/4 in N4141, emphasize mine.
As others have noted, the compilers are in compliance with the C++ standard because the One definition rule states that you shall have only one definition of a function, except if the function is inline then the definitions must be the same.
In practice, what happens is that the function is flagged as inline, and at linking stage if it runs into multiple definitions of an inline flagged token, the linker silently discards all but one. If it runs into multiple definitions of a token not flagged inline, it instead generates an error.
This property is called inline because, prior to LTO (link time optimization), taking the body of a function and "inlining" it at the call site required that the compiler have the body of the function. inline functions could be put in header files, and each cpp file could see the body and "inline" the code into the call site.
It doesn't mean that the code is actually going to be inlined; rather, it makes it easier for compilers to inline it.
However, I am unaware of a compiler that checks that the definitions are identical before discarding duplicates. This includes compilers that otherwise check definitions of function bodies for being identical, such as MSVC's COMDAT folding. This makes me sad, because it is a reall subtle set of bugs.
The proper way around your problem is to place the function in an anonymous namespace. In general, you should consider putting everything in a source file in an anonymous namespace.
Another really nasty example of this:
// A.cpp
struct Helper {
std::vector<int> foo;
Helper() {
foo.reserve(100);
}
};
// B.cpp
struct Helper {
double x, y;
Helper():x(0),y(0) {}
};
methods defined in the body of a class are implicitly inline. The ODR rule applies. Here we have two different Helper::Helper(), both inline, and they differ.
The sizes of the two classes differ. In one case, we initialize two sizeof(double) with 0 (as the zero float is zero bytes in most situations).
In another, we first initialize three sizeof(void*) with zero, then call .reserve(100) on those bytes interpreting them as a vector.
At link time, one of these two implementations is discarded and used by the other. What more, which one is discarded is likely to be pretty determistic in a full build. In a partial build, it could change order.
So now you have code that might build and work "fine" in a full build, but a partial build causes memory corruption. And changing the order of files in makefiles could cause memory corruption, or even changing the order lib files are linked, or upgrading your compiler, etc.
If both cpp files had a namespace {} block containing everything except the stuff you are exporting (which can use fully qualified namespace names), this could not happen.
I've caught exactly this bug in production multiple times. Given how subtle it is, I do not know how many times it slipped through, waiting for its moment to pounce.
POINT OF CLARIFICATION:
Although the answer rooted in C++ inline rule is correct, it only applies if both sources are compiled together. If they are compiled separately, then, as one commentator noted, each resulting object file would contain its own 'foo()'. HOWEVER: If these two object files are then linked together, then because both 'foo()'-s are non-static, the name 'foo()' appears in the exported symbol table of both object files; then the linker has to coalesce the two table entries, hence all internal calls are re-bound to one of the two routines (presumably the one in the first object file processed, since it is already bound [i.e the linker would treat the second record as 'extern' regardless of binding]).
How does the following work?
#include <limits>
int main()
{
const int* const foo = &std::numeric_limits<int> ::digits;
}
I was under the impression that in order to take an address of a static const-ant member we had to physically define it in some translation unit in order to please the linker. That said, after looking at the preprocessed code for this TU, I couldn't find an external definition for the digits member (or any other relevant members).
I tested this on two compilers (VC++ 10 and g++ 4.2.4) and got identical results from both (i.e., it works). Does the linker auto-magically link against an object file where this stuff is defined, or am I missing something obvious here?
Well, what makes you think that it is not defined? The very fact that your attempt to take the address succeeded automatically indicates that it is defined somewhere. It is not required to reside in your tranlation unit, of course, so looking through the preprocessor output doesn't make much sense.