How compiler treat extern variable - c++

This may be minor, but I'm curious about the reason.
This comes from a practice code of my friend:
#include <iostream>
using namespace std ;
extern int* PPPP;
void main(){
cout<<"*PPPP"<<*PPPP<<endl;
}
By mistake, the PPPP is actually declared nowhere.
But curiously we can compile this into a static lib.
However, we can't make this into a dll, there are link errors (unresolved external sysmbol pppp)
We are guessing it's because that when making a static lib, the name PPPP (though extern) do has a space in memory anyhow, so, no problem occur in this.
We are not sure about this at all. We hope to hear some more and accurate information about this.
Thanks in advance.

A static library is intended to be linked to another set of files, so it can contain undefined symbols as these would be resolved at a later stage (or not, in which case you get a linker error).
However a DLL, just like an executable, needs to be fully linked and so can't contain any undefined references.

I'm guessing that when it's made into a static library, the linker assumes that any unresolved symbols will be available when fully linked.
If you link that static library to some program without a symbol called PPPP defined, it'll fail with a linker error.

When you say:
extern int* PPPP;
you are promising the compiler that PPPP is located in another translation unit.
The linker will try to find PPPP in the object files and libraries it's given and, if it can't, it'll issue an error.

Related

C++ compiles and links with pointer to undefined function

This code:
void undefined_fcn();
void defined_fcn() {}
struct api_t {
void (*first)();
void (*second)();
};
api_t api = {undefined_fcn, defined_fcn};
defines a global variable api with a pointer to a non-existent function. However, it compiles, and to my surprise, links with absolutely no complaints from GCC, even with all those -Wall -Wextra -Werror -pedantic flags.
This code is part of a shared library. Only when I load the library, at run-time, it finally fails. How do I check, at library link-time, that I did't forget to define any function?
Update: this question mentions the same problem, and the answer is the same: -Wl,--no-undefined. (by the way, I guess this could even be marked as duplicate). However, according to the accepted answer below, you should be careful when using -Wl,--no-undefined.
This code is part of a shared library.
That's the key. The whole purpose of having a shared library is to have an "incomplete" shared object, with undefined symbols that must be resolved when the main executable loads it and all other shared libraries it gets linked with. At that time, the runtime loader attempts to resolve all undefined symbols; and all undefined symbols must be resolved, otherwise the executable will not start.
You stated you're using gcc, so you are likely using GNU ld. For the reason stated above, ld will link a shared library with undefined symbols, but will fail to link an executable unless all undefined symbols are resolved against the shared libraries the executable gets linked with. So, at runtime, the expected behavior is that the runtime loader is expected to successfully resolve all symbols too; so the only situation when the runtime loader fails to start the executable will indicate a fatal runtime environment failure (such as a shared library getting replaced with an incompatible version).
There are some options that can be used to override this behavior. The --no-undefined option instructs ld to report a link failure for undefined symbols when linking a shared libraries, just like executables. When invoking ld indirectly via gcc this becomes -Wl,--no-undefined.
However, you are likely to discover that this is going to be a losing proposition. You better hope that none of the code in your shared library uses any class in the standard C++ or C library. Because, guess what? -- those references will be undefined symbols, and you will fail to link your shared library!
In other words, this is a necessary evil that you need to deal with.
You can't have the compiler tell you whether you forgot to define the function in that implementation file. And the reason is when you define a function it is implicitly marked extern in C++. And you cannot tell what is in a shared library until after it is linked (the compiler's linker does not know if the reference is defined)
If you are not familiar with what extern means. Things marked extern signal external linkage, so if you have a variable that is extern the compiler doesn't require a definition for that variable to be in the translation unit that uses it. The definition can be in another implementation file and the reference is resolved at link time (when you link with a translation unit that defines the variable). The same applies for functions, which are essentially variables of a function type.
To get the behavior you want make the function static which tells the compiler that the function is not extern and is a part of the current translation unit, in which case it must be defined -Wundefined-internal picks up on this (-Wundefined-internal is a part of -Werror so just compile with that)

Why is it that when you don't define a function you get a linker error and not a compiler?

For Example
#include <iostream>
int add(int x, int y);
int main()
{
cout << add(5, 5) << endl;
}
This would compile but not link. I understand the problem, I just don't understand why it compiles fine but doesn't link.
Because the compiler doesn't know whether that function is provided by a library (or other translation unit). There's nothing in your prototype that tells the compiler the function is defined locally (you could use static for that).
The input to a C or C++ compiler is one translation unit - more or less one source code file. Once the compiler is finished with that one source code, it has done its job.
If you call/use a symbol, such as a function, which is not part of that translation unit, the compiler assumes it's defined somewhere else.
Later on, you link together all the object files and possibly the libraries you want to use, all references are tied together - it's only at this point, when pulling together everything that's supposed to create an executable, one can know that something is missing.
When a compiler compiles, it generates the output (object file) with the table of defined symbols (T) and undefined symbols (U) (see man page of nm). Hence there is no requirement that all the references are defined in every translation unit. When all the object files are linked (with any libraries etc), the final binary should have all the symbols defined (unless the target in itself is a library). This is the job of the linker. Hence based on the requested target type (library or not), the linker might not or might give an error for undefined functions. Even if the target is a non-library, if it is not statically linked, it still might refer to shared libraries (.so or .dll), hence if on the target machine while the binary is run, if the shared libraries are missing or if any symbols missing, you might even get a linker error. Hence between compiler, linker and loader, every one is trying to best provide you with the definition of every symbol needed. Here by giving declaring add, you are pacifying the compiler, which hopes that the linker or loader would do the required job. Since you didnt pacify the linker (by say providing it with a shared library reference), it stops and cribs. If you have even pacified the linker, you would have got the error in the loader.

How does the linker knows which symbol to which one link?

Say I have two .cpp files and in one of them I wrote
extern int i ;
and in another one I define the i variable.
Now how the linker knows that in the first file the i should be linked to the address of "i" in the second file? This question arises, because as I know, the object file does not have any info about variable names (it knows only addresses) (see this link).
I am really confused in this.
Some light reading: Beginner's Guide to Linkers.
The object code has symbol definitions in it. The linker uses these to resolve references to symbols. The symbols are not part of the executable code, and cannot be read by code that is contained within the object file (hence the answer to the question to link to).
The linked executable may also have symbols in it (e.g. for use by a debugger), or may have symbols removed at link stage (or later) since they are of no use to the code contained within the executable.

can linker omit object file when linking static lib?

I have a static library (lib.a) and a program that links to it. The library doesn't have any entry point that would always be called before using it, but I need to execute a piece of code very early in the program (preferably before main() starts). Therefore I thought I would use static variable of my own class. I added new source file that contains something like:
#include <MyClass.h>
static MyClass myVar;
The constructor of MyClass would then execute my code. When I link lib.a and try executing "nm" on it I get information that myVar is there. However, when I link my program and try "nm" on it I do not see myVar. When I put this piece of code into an existing file then the symbol is visible in the final executable. Why is that? Can linker omit object file from lib.a library in this case? I know that the variable is not referenced from outside (it cannot be as it is static) but it should execute code on it's own and therefore I don't get why should it be removed.
In case it makes a difference I'm using some old SunPro compiler.
Technically speaking, the linker should be forced to include that object file while compiling your program. However, support for this is buggy in many compilers, such as MSVC++. Adding an external reference somewhere in your main program should force that object file to be included.
Also note that in the case of nm, it's possible that your static initializer was inlined, and therefore the symbol need not exist in your final binary. Try something with side effects (such as a std::cout statement) in your static, and make sure it doesn't run before blaming the compiler :)
It turns out that what the linker does is pretty standard (I don't mean C++ standard, just generally observer behaviour) and you can work around it. In GNU ld it is --whole-archive option, in my case of Sun tools it is -z allextract. Which didn't actually work as expected for my project, so I used some magic with weak symbols an -z weakextract to achieve what I wanted.

How to handle linker errors in C++/GNU toolchain?

Given a C++/GNU toolchain, what's a good method or tool or strategy to puzzle out linker errors?
Not sure exactly what you mean but if you are talking about cryptic linker symbols like:
mylib.so: undefined symbol: _ZN5CandyD2Ev
you can use c++filt to do the puzzling for you.
c++filt _ZN5CandyD2Ev
will return Candy::~Candy() so somehow Candy's destructor didn't get linked.
With gcc toolchain, I use:
nm: to find the symbols in object files
ld: to find how a library links
c++filt: to find the C++ name of a symbol from its mangled name
Check this for details.
Well the first thing would be RTFM. No, seriously, read the documentation.
If you don't want to do that, try a search on the error that comes up.
Here are a few other things to remember: "missing" symbols are often an indication that you haven't included the appropriate source or library; "missing" symbols are sometimes an indication that you're attempting to link a library created with a different mangling convention (or a different compiler); make sure that you have extern "C" where appropriate; declaring and defining aren't the same thing; if your compiler doesn't support "export" make sure your template code is available for when you instantiate objects.
Look at the names of the symbols that are reported to be problematic.
If there are missing symbols reported, find out in which source files or libraries those function/... should be defined in. Inspect the compilation/linker settings to find out why these files aren't compiled or linked.
If there are multiply defined symbols the linker usually mentions which object files or libraries contain them. Look at those files/their sources to find out why the offending functions/... are included in both of them.