C++ modules: Multiply defined symbols no longer an error? - c++

If you have two classic translation units which both define the same symbol (let's say auto fun0() -> void), I get a fatal error LNK1169: one or more multiply defined symbols found in MSVC as it violates the ODR.
One of the first steps I made with C++ modules was trying out the behavior with that basic principle. So we have two module files (module0.ixx and module1.ixx) with this almost identical content:
// module0.ixx
export module module0;
import <cstdio>;
export void f_test() { printf("f_test()\n"); }
// module1.ixx
export module module1;
import <cstdio>;
export void f_test() { printf("f_test()\n"); }
In my main.cpp, I do
import module0;
import module1;
auto main() -> int{
f_test();
}
To my surprise, this compiles just fine. With that come the expected problems: If the definition is different, the behavior depends on the order etc. Is this expected? This was 5 minutes into playing around with modules and seems pretty baffling.

"Multiply defined symbols no longer an error?" - They were never required to be an error. Violation of the ODR is "ill-formed; no diagnostic required". Which basically means undefined behavior ensues, like you see here.
This issue is not specifically endemic to modules. "Old C++" can exhibit the same behavior.
// oops_header.h
#ifndef OOPS
#error OOPS
#endif
#include <iostream>
inline void oops_func() { std::cout << OOPS; }
And we the same shenanigans when we include it and mess with the macro
// tu1.cpp
#define OOPS 123
#include <oops_header.h>
void a() { oops_func(); }
// tu2.cpp
#define OOPS 42
#include <oops_header.h>
void b() { oops_func(); }
Where a main function like this
extern void a();
extern void b();
int main() {
a();
b();
}
Will exhibit the exact same problems you encountered with the modules. The output will depend on the stars, because I violated the ODR (inline functions are required by the ODR to be identical on the token level).
You see it with modules due to an artefact of their implementation today (kinda similar to precompiled headers).

Related

why is moving the definition of the operator() from .h to .cpp file causing a data race?

in my code I have the following header files:
Global.h:
#ifndef GLOBAL_H_
#define GLOBAL_H_
#include <mutex>
namespace
{
std::mutex outputMutex;
}
#endif
Test.h:
#ifndef TEST_H_
#define TEST_H_
#include"Global.h"
#include<string>
#include<iostream>
class TestClass
{
std::string name;
public:
TestClass(std::string n):name{n}{}
void operator()()
{
for (int i=0;i<30;++i)
{
std::lock_guard<std::mutex> lock(outputMutex);
std::cout<<name<<name<<name<<name<<name<<name<<name<<std::endl;
}
}
};
#endif
Test2.h is actually equal to Test1.h, only containing a class called "TestClass2" instead of "TestClass".
My main.cpp looks like this:
#include<iostream>
#include <thread>
#include "Global.h"
#include "Test.h"
#include "Test2.h"
using namespace std;
int main()
{
TestClass obj1("Hello");
TestClass2 obj2("GoodBye");
thread t1(obj1);
thread t2(obj2);
t1.join();
t2.join();
}
If I run the program like this I get the expected output:
HelloHelloHelloHelloHelloHelloHello
or
GoodByeGoodByeGoodByeGoodByeGoodByeGoodByeGoodBye
So far so good. But when I put the definition of the ()-operator of Test.h and Test2.h in source files Test.cpp and Test2.cpp:
(Test.cpp, same for Test2.cpp):
#include "Test.h"
#include"Global.h"
void TestClass::operator()()
{
for (int i=0;i<30;++i)
{
std::lock_guard<std::mutex> lock(outputMutex);
std::cout<<name<<name<<name<<name<<name<<name<<name<<std::endl;
}
}
and accordingly remove the definition from the header-files: void operator()(); I suddenly start getting occasional outputs like this:
GoodByeHelloGoodByeHelloGoodByeHelloGoodByeHelloGoodByeHelloGoodByeHelloGoodByeHello
I don't know why the lock with the mutex variable outputMutex doesn't work any more, but I assume it has something to do with two versions of the variable being created, but I'd love to get a professional explanation. I'm using Eclipse with Cygwin.
This is a mixture of undefined behavior and anonymous namespaces.
First this:
namespace {
std::mutex outputMutex;
}
this is an anonymous namespace containing the mutex outputMatrix. A different outputMatrix exists in every source file, as it has a different name.
That is what anonymous namespaces do. Think of them as "generate unique guid here for each cpp file that builds this". They are intended to prevent link-time symbol collisions.
class TestClass {
std::string name;
public:
// ...
void operator()() {
// ...
}
};
this is an (implicitly) inline TestClass::operator(). Its body is compiled in each compilation unit. By the ODR the body must be the same in every compilation unit, or your program is ill-formed, no diagnostic required. (methods defined inside a class definition are implicitly inline, with all that baggage).
It uses a token from an anonymous namespace. This token has a different meaning in each compilation unit. If there is more than one compilation unit, the result is an ill-formed program with no diagnostic required; the C++ standard places no restrictions on its behavior1.
In this particular case, the same compilation unit was chosen for operator() from TestClass and TestClass2. So it used the same mutex. This is not reliable; a partial rebuild could cause it to change, or the phases of the moon.
When you put it into its own .cpp file, it was no longer implicitly inline. Only one definition existed, but they where in separate compilation units.
These two different compilation units got a different outputMatrix mutex.
1 The most common effect of violating that particular rule is that the linker picks one implementation based on arbitrary criteria (that can change from build to build!), and silently discards the rest. This is not good, as innocuous changes to the build process (adding more cores, partial builds, etc) can break your code. Don't violate the "inline functions must have the same definition everywhere" rule. This is just the most common symptom; you are not guaranteed to have anything this sensible happen.

C++ program using a C library headers is recognizing "this" as a keyword. Extern "C" error?

My C++ program needs to use an external C library.
Therefore, I'm using the
extern "C"
{
#include <library_header.h>
}
syntax for every module I need to use.
It worked fine until now.
A module is using the this name for some variables in one of its header file.
The C library itself is compiling fine because, from what I know, this has never been a keyword in C.
But despite my usage of the extern "C" syntax,
I'm getting errors from my C++ program when I include that header file.
If I rename every this in that C library header file with something like _this,
everything seems to work fine.
The question is:
Shouldn't the extern "C" syntax be enough for backward compatibility,
at least at syntax level, for an header file?
Is this an issue with the compiler?
Shouldn't the extern "C" syntax be enough for backward compatibility, at least at syntax level, for an header file? Is this an issue with the compiler?
No. Extern "C" is for linking - specifically the policy used for generated symbol names ("name mangling") and the calling convention (what assembly will be generated to call an API and stack parameter values) - not compilation.
The problem you have is not limited to the this keyword. In our current code base, we are porting some code to C++ and we have constructs like these:
struct Something {
char *value;
char class[20]; // <-- bad bad code!
};
This works fine in C code, but (like you) we are forced to rename to be able to compile as C++.
Strangely enough, many compilers don't forcibly disallow keyword redefinition through the preprocessor:
#include <iostream>
// temporary redefinition to compile code abusing the "this" keyword
#define cppThis this
#define this thisFunction
int this() {
return 1020;
}
int that() {
return this();
}
// put the C++ definition back so you can use it
#undef this
#define this cppThis
struct DumpThat {
int dump() {
std::cout << that();
}
DumpThat() {
this->dump();
}
};
int main ()
{
DumpThat dt;
}
So if you're up against a wall, that could let you compile a file written to C assumptions that you cannot change.
It will not--however--allow you to get a linker name of "this". There might be linkers that let you do some kind of remapping of names to help avoid collisions. A side-effect of that might be they allow you to say thisFunction -> this, and not have a problem with the right hand side of the mapping being a keyword.
In any case...the better answer if you can change it is...change it!
If extern "C" allowed you to use C++ keywords as symbols, the compiler would have to resolve them somehow outside of the extern "C" sections. For example:
extern "C" {
int * this; //global variable
typedef int class;
}
int MyClass::MyFunction() { return *this; } //what does this mean?
//MyClass could have a cast operator
class MyOtherClass; //forward declaration or a typedef'ed int?
Could you be more explicit about "using the this name for some variables in one of its header files"?
Is it really a variable or is it a parameter in a function prototype?
If it is the latter, you don't have a real problem because C (and C++) prototypes identify parameters by position (and type) and the names are optional. You could have a different version of the prototype, eg:
#ifdef __cplusplus
extern "C" {
void aFunc(int);
}
#else
void aFunc(int this);
#endif
Remember there is nothing magic about header files - they just provide code which is lexically included in at the point of #include - as if you copied and pasted them in.
So you can have your own copy of a library header which does tricks like the above, just becoming a maintenance issue to ensure you track what happens in the original header. If this was likely to become an issue, add a script as a build step which runs a diff against the original and ensures the only point of difference is your workaround code.

How do I explain this LNK2005?

So, someone came to me with a project that failed linking with the error LNK2005: symbol already defined in object (using Visual Studio 2010). In this case, I know what is wrong (and hence could point them to the correct solution), but I don't know why this is wrong on a level to give a good explanation about it (to prevent it happening again).
// something.h
#ifndef _SOMETHING_H
#define _SOMETHING_H
int myCoolFunction();
int myAwesomeFunction() // Note implementing function in header
{
return 3;
}
#endif
-
// something.cpp
#include "something.h"
int myCoolFunction()
{
return 4;
}
-
// main.cpp
#include <iostream>
#include "something.h"
int main()
{
std::cout << myAwesomeFunction() << std::endl;
}
This fails linking, and is fixed by putting myAwesomeFunction() into the .cpp and leaving a declaration in the .h.
My understanding of how the linker works comes pretty much from here. To my understanding, we are providing a symbol that is required in one place.
I looked up the MSDN article on LNK2005, which matches how I expect linkers to behave (provide a symbol more than once -> linker is confused), but doesn't seem to cover this case (which means I'm not understanding something obvious about linking).
Google and StackOverflow yield issues with people not including an #ifndef or #pragma once (which leads to multiple declarations of provided symbols)
A related question I found on this site has the same problem, but the answer doesn't explain why we're getting this problem adequately to my level of understanding.
I have a problem, I know the solution, but I don't know why my solution works
In a typical C++ project, you compile each of the implementation (or .cpp) files separately - you generally never pass a header (or .h) file to the compiler directly. After all preprocessing and inclusions are performed, each of these files becomes a translation unit. So in the example you've given, there are two translation units that look like this:
main.cpp translation unit:
// Contents of <iostream> header here
int myCoolFunction();
int myAwesomeFunction() // Note implementing function in header
{
return 3;
}
int main()
{
std::cout << myAwesomeFunction() << std::endl;
}
something.cpp translation unit:
int myCoolFunction();
int myAwesomeFunction() // Note implementing function in header
{
return 3;
}
int myCoolFunction()
{
return 4;
}
Notice that both of these translation units contain duplicate content because they both included something.h. As you can see, only one of the above translation units contains a definition of myCoolFunction. That's good! However, they both contain a definition of myAwesomeFunction. That's bad!
After the translation units are compiled separately, they are then linked to form the final program. There are certain rules about multiple declarations across translation units. One of those rules is (§3.2/4):
Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in that program; no diagnostic required.
You have more than one definition of myAwesomeFunction across your program and so you are breaking the rules. That's why your code doesn't link correctly.
You can think of it from the linker's perspective. After these two translation units are compiled, you have two object files. The linker's job is to connect the object files together to form the final executable. So it sees the call to myAwesomeFunction in main and tries to find a corresponding function definition in one of the object files. However, there are two definitions. The linker doesn't know which one to use so it just gives up.
Now let's see what the translation units look like if you define myAwesomeFunction in something.cpp:
Fixed main.cpp translation unit:
// Contents of <iostream> header here
int myCoolFunction();
int myAwesomeFunction();
int main()
{
std::cout << myAwesomeFunction() << std::endl;
}
Fixed something.cpp translation unit:
int myCoolFunction();
int myAwesomeFunction();
int myCoolFunction()
{
return 4;
}
int myAwesomeFunction()
{
return 3;
}
Now it's perfect. There is only one definition of myAwesomeFunction across the whole program now. When the linker sees the call to myAwesomeFunction in main, it knows exactly which function definition it should link it to.
The linker is merely letting you know that you broke the one definition rule. This is a basic, well-documented rule of C++ - it isn't solved by using include guards or #pragma once directives, but, in case of a free function, by marking it inline or moving the implementation to a source file.
When a non-inline method is implemented in a header, all translation units that include that header will define it. When the corresponding .obj files are linked together, the linker detects the same symbol is exported (and defined) multiple times, and complains.
Moving the implementation to a cpp file effectively transforms your initial definition into a declaration.
myAwesomeFunction is defined in two source files: something.cpp and main.cpp. Move its implementation to one of source files, or declare this function as static.

how to make ld treat Multiply defined structs/classes as an error?

EDIT-- clarifying the goal of my question:
I lose a lot of time diagnosing problems that I expect the linker to report, caused by an admittedly bad programming style, which pops up when e.g. copy-pasting a block of code from one compilation unit to another, and altering it.
I'm looking for a way to detect this problem at compile/link time.
In this setup:
A.h
void foo();
A.cpp
struct A {
int values[100];
A(){
std::cout << __FILE__ << ": A::A()\n";
}};
void foo(){
A a;
}
main.cpp
#include "A.h"
struct A {
double values[100];
A(){
std::cout << __FILE__ << ": A::A()\n";
}};
int main(){ foo(); }
// void foo(){} ===> this would cause a linker error
I would love the linker to report that the structure A, or at least the constructor A::A(), is defined twice.
However, g++ 4.4 links just fine. Running the code shows that in this case, the linker chose to use the A from A.cpp.
$ g++ -Wall A.cpp main.cpp && ./a.out
A.cpp:3
A.cpp:7
A.cpp:3
When a function foo() is present in two object files, the linker reports a multiple definition allright, but for the structures, it doesn't.
EDIT: just found by using nm -C *.o that both A.o and main.o have A::A() defined as a weak symbol. This causes it to be 'selectable' from a pool of symbols with the same name. Maybe the question can be rephrased to "how can I cause the compiler to generate strong symbols?"...
00000000 W A::A()
How can I detect this problem?
Maybe the question can be rephrased to "how can I cause the compiler to generate strong symbols?"...
Try to restrict the use of inline functions:
struct A {
A();
};
// Inside A.cpp
A::A() {
std::cout << __FILE__ << ": A::A()\n";
}
An implementation is much more likely to report an ODR violation for a function that is not declared inline (including those that are implicitly declared inline, like members defined inside a class definition), although strictly speaking such a diagnostic is never required.
It's not a problem, and it's not a redefinition. It's how C++ works. Think about it — you put class definitions in headers (exposing just declaration is far less common). Headers are pretty much copy-pasted into every translation unit that uses them. It cannot be an error to have multiple definitions of the same class in multiple TUs. So, it's not something to solve.
Compiler/linker should complain if there are different classes defined under the same name, though.

unpredictable behavior of Inline functions with different definitions

I have the following source files:
//test1.cpp
#include <iostream>
using namespace std;
inline void foo()
{
cout << "test1's foo" << endl;
}
void bar();
int main(int argc, char *argv[])
{
foo();
bar();
}
and
//test2.cpp
#include <iostream>
using namespace std;
inline void foo()
{
cout << "test2's foo" << endl;
}
void bar()
{
foo();
}
The output:
test1's foo
test1's foo
Huh??? Ok, so I should have declared the foos static... but shouldn't this kind of thing generate a linker error, or at least a warning? And how does the compiler "see" the inline functions from across compilation units?
EDIT: This is using gcc 4.4.1.
You are running into the one-definition-rule. You are not seeing any error because:
[Some] violations, particularly those that span translation units, are not required to be diagnosed
What going on under the covers is that the compiler is not inlining those functions (many compilers will not inline a function unless the code is compiled with the optimizer). Since the function is inline and can appear in multiple translation units, the compiler will mark the function as link-once which tells the linker that it not treat multiple definitions as an error but just use one of them.
If you really want them to be different, you want a static function.
R Samuel Klatchko's answer is correct, but I'll answer part that he didn't.
"And how does the compiler "see" the inline functions from across compilation units?"
It doesn't. It sees the external definitions of functions that are declared not to be static. For such functions, the compiler can inline if it wishes but it also has to generate code callable from outside.
The inline function is placed in a COMDAT section. That's a signal to the linker that it is free to pick any one of the multiple definitions it encounters. You'll get another output message when you reverse the link order.
Another way to place definitions in a COMDAT section (compiler allowing) is:
__declspec(selectany) int globalVariableInHeader = 42;
Which is handy to avoid the "extern" song and dance. Clearly, this mechanism was designed to allow multiple definitions introduced by one header file getting #included by multiple source files to be resolved by the linker. Fwiw, MSVC has the exact same behavior.