I'm developing a shared library using C++ and want a C header for user to include.
The library exports a thread_local global variable, so that an extern instruction should be writed in the header.
And this variable only be modified in library code (C++), and readonly in user's code (C language).
However C language doesn't have thread_local keyword.
So, Any idea about it?
Does the simply extern type variable is definitely correct?
And the case of modifying the variable in C ? Does everything still works?
I imagine the variable is referenced relative to a pointer that is stored in the thread context. May be even a register. When a thread is created the data is allocated and the pointer is initialised to point to it. (Space will always be required for the context.) When the process switches from one thread to another the current context is stored a different context restored, including the indirect pointer. If you do not use the correct type the patching at link time will not be correct and the correct area of memory will not be used. I do not know if the patching will just be wrong, or if the linker knows the symbol type, giving ether an explicit error or just seeing the global as undefined. Ie i don't know if it sees it as a different symbol, the same symbol but declared with conflicting types, or the same symbol, but will the mixed up reference methods will be break the code.
Related
I'm trying to understand what happens when modules with globals and static variables are dynamically linked to an application.
By modules, I mean each project in a solution (I work a lot with visual studio!). These modules are either built into *.lib or *.dll or the *.exe itself.
I understand that the binary of an application contains global and static data of all the individual translation units (object files) in the data segment (and read only data segment if const).
What happens when this application uses a module A with load-time dynamic linking? I assume the DLL has a section for its globals and statics. Does the operating system load them? If so, where do they get loaded to?
And what happens when the application uses a module B with run-time dynamic linking?
If I have two modules in my application that both use A and B, are copies of A and B's globals created as mentioned below (if they are different processes)?
Do DLLs A and B get access to the applications globals?
(Please state your reasons as well)
Quoting from MSDN:
Variables that are declared as global in a DLL source code file are treated as global variables by the compiler and linker, but each process that loads a given DLL gets its own copy of that DLL's global variables. The scope of static variables is limited to the block in which the static variables are declared. As a result, each process has its own instance of the DLL global and static variables by default.
and from here:
When dynamically linking modules, it can be unclear whether different libraries have their own instances of globals or whether the globals are shared.
Thanks.
This is a pretty famous difference between Windows and Unix-like systems.
No matter what:
Each process has its own address space, meaning that there is never any memory being shared between processes (unless you use some inter-process communication library or extensions).
The One Definition Rule (ODR) still applies, meaning that you can only have one definition of the global variable visible at link-time (static or dynamic linking).
So, the key issue here is really visibility.
In all cases, static global variables (or functions) are never visible from outside a module (dll/so or executable). The C++ standard requires that these have internal linkage, meaning that they are not visible outside the translation unit (which becomes an object file) in which they are defined. So, that settles that issue.
Where it gets complicated is when you have extern global variables. Here, Windows and Unix-like systems are completely different.
In the case of Windows (.exe and .dll), the extern global variables are not part of the exported symbols. In other words, different modules are in no way aware of global variables defined in other modules. This means that you will get linker errors if you try, for example, to create an executable that is supposed to use an extern variable defined in a DLL, because this is not allowed. You would need to provide an object file (or static library) with a definition of that extern variable and link it statically with both the executable and the DLL, resulting in two distinct global variables (one belonging to the executable and one belonging to the DLL).
To actually export a global variable in Windows, you have to use a syntax similar to the function export/import syntax, i.e.:
#ifdef COMPILING_THE_DLL
#define MY_DLL_EXPORT extern "C" __declspec(dllexport)
#else
#define MY_DLL_EXPORT extern "C" __declspec(dllimport)
#endif
MY_DLL_EXPORT int my_global;
When you do that, the global variable is added to the list of exported symbols and can be linked like all the other functions.
In the case of Unix-like environments (like Linux), the dynamic libraries, called "shared objects" with extension .so export all extern global variables (or functions). In this case, if you do load-time linking from anywhere to a shared object file, then the global variables are shared, i.e., linked together as one. Basically, Unix-like systems are designed to make it so that there is virtually no difference between linking with a static or a dynamic library. Again, ODR applies across the board: an extern global variable will be shared across modules, meaning that it should have only one definition across all the modules loaded.
Finally, in both cases, for Windows or Unix-like systems, you can do run-time linking of the dynamic library, i.e., using either LoadLibrary() / GetProcAddress() / FreeLibrary() or dlopen() / dlsym() / dlclose(). In that case, you have to manually get a pointer to each of the symbols you wish to use, and that includes the global variables you wish to use. For global variables, you can use GetProcAddress() or dlsym() just the same as you do for functions, provided that the global variables are part of the exported symbol list (by the rules of the previous paragraphs).
And of course, as a necessary final note: global variables should be avoided. And I believe that the text you quoted (about things being "unclear") is referring exactly to the platform-specific differences that I just explained (dynamic libraries are not really defined by the C++ standard, this is platform-specific territory, meaning it is much less reliable / portable).
The answer left by Mikael Persson, although very thorough, contains a severe error (or at least misleading), in regards to the global variables, that needs to be cleared up. The original question asked if there were seperate copies of the global variables or if global variables were shared between the processes.
The true answer is the following: There are seperate (multiple) copies of the global variables for each process, and they are not shared between processes. Thus by stating the One Definition Rule (ODR) applies is also very misleading, it does not apply in the sense they are NOT the same globals used by each process, so in reality it is not "One Definition" between processes.
Also even though global variables are not "visible" to the process,..they are always easily "accesible" to the process, because any function could easily return a value of a global variable to the process, or for that matter, a process could set a value of a global variable through a function call. Thus this answer is also misleading.
In reality, "yes" the processes do have full "access" to the globals, at the very least through the funtion calls to the library. But to reiterate, each process has it's own copy of the globals, so it won't be the same globals that another process is using.
Thus the entire answer relating to external exporting of globals really is off topic, and unnecessary and not even related to the original question. Because the globals do not need extern to be accessed, the globals can always be accessed indirectly through function calls to the library.
The only part that is shared between the processes, of course, is the actual "code". The code only loaded in one place in physical memory (RAM), but that same physical memory location of course is mapped into the "local" virtual memory locations of each process.
To the contrary, a static library has a copy of the code for each process already baked into the executable (ELF, PE, etc.), and of course, like dynamic libraries has seperate globals for each process.
In unix systems:
It is to be noted , that the linker does not complain if two dynamic libraries export same global variables. but during execution a segfault might arise depending on access violations. A usual number exhibiting this behavior would be segmentation fault 15
segfault at xxxxxx ip xxxxxx sp xxxxxxx error 15 in a.out
I'm currently updating a C++ library for Arduino (Specifically 8-bit AVR processors compiled using avr-gcc).
Typically the authors of the default Arduino libraries like to include an extern variable for the class inside the header, which is defined in the class .cpp file also. This I assume is basically to have everything provided ready to go for newbies as built-in objects.
The scenario I have is: The library I have updated no longer requires the .cpp file and I have removed it from the library. It wasn't until I went on a final pass checking for bugs that I realized, no linker error was produced despite the fact a definition wasn't provided for the extern variable in a .cpp file.
This is as simple as I can get it (header file):
struct Foo{
void method() {}
};
extern Foo foo;
Including this code and using it in one or many source files does not cause any linker error. I have tried it in both versions of GCC which Arduino uses (4.3.7, 4.8.1) and with C++11 enabled/disabled.
In my attempt to cause an error, I found it was only possible when doing something like taking the address of the object or modifying the contents of a dummy variable I added.
After discovering this I find its important to note:
The class functions only return other objects, as in, nothing like operators returning references to itself, or even a copy.
It only modifies external objects (registers which are effectively volatile uint8_t references in code), and returns temporaries of other classes.
All of the class functions in this header are so basic that they cost less than or equal to the cost of a function call, therefore they are (in my tests) completely in-lined into the caller. A typical statement may create many temporary objects in the call chain, however the compiler sees through these and outputs efficient code modifying registers directly, rather than a set of nested function calls.
I also recall reading in n3797 7.1.1 - 8 that extern can be used on incomplete types, however the class is fully defined whereas the declaration is not (this is probably irrelevant).
I'm led to believe that this may be a result of optimizations at play. I have seen the effect that taking the address has on objects which would otherwise be considered constant and compiled without RAM usage. By adding any layer of indirection to an object in which the compiler cannot guarantee state will cause this RAM consuming behavior.
So, maybe I've answered my question by simply asking it, however I'm still making assumptions and it bothers me. After quite some time hobby-coding C++, literally the only thing on my list of do-not's is making assumptions.
Really, what I want to know is:
With respect to the working solution I have, is it a simple case of documenting the inability to take the address (cause indirection) of the class?
Is it just an edge case behavior caused by optimizations eliminating the need for something to be linked?
Or is plain and simple undefined behavior. As in GCC may have a bug and is permitting code that might fail if optimizations were lowered or disabled?
Or one of you may be lucky enough to be in possession of a decoder ring that can find a suitable paragraph in the standard outlining the specifics.
This is my first question here, so let me know if you would like to know certain details, I can also provide GitHub links to the code if needed.
Edit: As the library needs to be compatible with existing code I need to maintain the ability to use the dot syntax, otherwise I'd simply have a class of static functions.
To remove assumptions for now, I see two options:
Add a .cpp just for the variable declaration.
Use a define in the header like #define foo (Foo()) allowing dot syntax via a temporary.
I prefer the method using a define, what does the community think?
Cheers.
Declaring something extern just informs the assembler and the linker that whenever you use that label/symbol, it should refer to entry in the symbol table, instead of a locally allocated symbol.
The role of the linker is to replace symbol table entries with an actual reference to the address space whenever possible.
If you don't use the symbol at all in your C file, it will not show up in the assembly code, and thus will not cause any linker error when your module is linked with others, since there is no undefined reference.
It is either an edge case behaviour caused by optimization, or you never use the foo variable in your code. I'm not 100% sure it is formally not an undefined behavior, but i'm quite sure it isn't undefined from practical point of view.
extern variables are implemented in such way, that code compiled with them produces so-called relocations - empty places where addres of variable should be placed - which are then filled by linker. Apparently foo is never used in your code in such a way that would need getting it's address and therefore linker doesn't even try to find that symbol. If you turn optimization off (-O0) you will probably get linker error.
Update: If you want to keep "dot notation" but remove the problem with undefined extern, you may replace extern with static (in header file), creating separate "instance" of variable for each TU. As this variable is going to be optimized out anyway, this will not change the real code at all, but will also work for unoptimized build.
Ok, I am trying to make an OpenGL Application to use a Fragment Shader. I need to get some variables into the fragment shader using glUniform.
I've seen some examples that look like this:
static PFNGLSHADERSOURCEARBPROC glShaderSourceARB;
they have like 10 of those. When I put them in, it says, previous declaration of glShaderSourceARB, or glShaderSourceARB was declared 'extern' and then later 'static'.
When I DON'T put it in, every time I use glShaderSourceARB, I get an Undefined Reference to glShaderSourceARB.
HOW is this possible? It gets mad at me for declaring it twice, but if I take out 1 declaration, it says its not declared at all. Can someone explain how this is supposed to work?
Okay, your problem stems from lack of knowledge about C storage and scope qualifiers.
static in the global scope means: This symbol is visible to only the compilation unit its in and only that compilation unit. Other compilation units may have static symbols of the same name, but those are their own private symbols as well and they don't interfere.
extern means, that the symbol it refers to is defined and exposed (i.e. not static) somewhere else unit. You normally use it in headers. While it's certainly possible to write extern static this usually makes little sense to do.
Now what you did was introducing a new, global symbol with static scope, while the header already declared a not static symbol of the same name to exist. And the compiler tells you "sorry, this name is already taken; but your symbol doesn't match the extern declaration, so get lost."
However with just a extern declaration, but no actual definition of the symbol, the linker will at the end tell you: "There are a few parts missing, where is xyz, did nobody actually define it; everybody is referring to it (extern), but nobody actually provides it."
Okay, so should you define glShaderSourceARB as non-static then? No!
Because having that symbol around does not suffice. You also have to initialize it to something. For that you use glXGetProcAddress or wglGetProcAddress. You have to call that with the function name, for each and every symbol. And because in Windows the function addresses may depend on the active OpenGL context you have to put those symbols into thread local storage and reinitialize them everytime wglMakeCurrent is called.
What you really should do is get yourself a library that does all this tedious work drop it into your program and no longer think about it. Like GLEW, available at http://glew.sourceforge.net – read the documentation carefully, follow each step and things will work (if you follow the documentation). A word of recommendation: Embedding GLEW statically simplifies program distribution.
You evidently have a header where the function signatures are declared, but you don't have any library linked, which would actually contain the functions (that is the undefined reference error).
In Windows, the library does not contain shader functions (only functions up to OpenGL 1.1) and some third party library such as GLEW or GLEE is a must. You can also see how MiniShader is implemented (it does what GLEW / GLEE do, except it is optimized for minimal code size, such as in 4k intros).
In Linux, just linking with -lGL could help linking with the OpenGL library abd thus avoiding undefined references (if not, then use GLEW just as well).
* Question revised (see below) *
I have a cpp file that defines a static global variable e.g.
static Foo bar;
This cpp file is compiled into an executable and a shared library. The executable may load the shared library at run time.
If I am on Linux there seem to be two copies of this variable. I assume one comes from the executable and one from the shared library. Other platforms (HP, Windows) there seems to be only one copy.
What controls this behavior on Linux and can I change it? For example is there a compiler or linker flag that will force the version of this variable from the shared library to be the same as the one from the executable?
* Revision of question *
Thanks for the answers so far. On re-examining the issue it is not actually the problem stated above. The static global variable above does indeed have multiple copies on Windows, so no difference to what I see on Linux.
However, I have another global variable (not static this time) which is declared in a cpp file and as extern in a header file.
On Windows this variable has multiple copies, one in the executable and one in each dll loaded up, and on Linux it only has one. So the question is now about this difference. How can I make Linux have multiple copies?
(The logic of my program meant the value of the static global variable was dependent of the value of the non-static global variable and I started accusing the wrong variable as being the problem)
I strongly suggest you read the following. Afterwards, you will understand everything about shared libraries in Linux. As said by others, the quick answer is that the static keyword will limit the scope of the global variable to the translation unit (and thus, to the executable or shared library). Using the keyword extern in the header, and compiling a cpp containing the same global variable in only one of the modules (exe or dll/so) will make the global variable unique and shared amongst all the modules.
EDIT:
The behaviour on Windows is not the same as on Linux when you use the extern pattern because Windows' method to load dynamic link libraries (dlls) is not the same and is basically incapable of linking global variables dynamically (such that only one exists). If you can use a static loading of the DLL (not using LoadLibrary), then you can use the following:
//In one module which has the actual global variable:
__declspec(dllexport) myClass myGlobalObject;
//In all other modules:
__declspec(dllimport) myClass myGlobalObject;
This will make the myGlobalObject unique and shared amongst all modules that are using the DLL in which the first version of the above is used.
If you want each module to have its own instance of the global variable, then use the static keyword, the behaviour will be the same for Linux or Windows.
If you want one unique instance of the global variable AND require dynamic loading (LoadLibrary or dlopen), you will have to make an initialization function to provide every loaded DLL with a pointer to the global variable (before it is used). You will also have to keep a reference count (you can use a shared_ptr for that) such that you can create a new one when none exist, increment the count otherwise, and be able to delete it when the count goes to zero as DLLs are being unloaded.
The static qualifier applied to a namespace variable means that the scope of the variable is the translation unit. That means that if that variable is defined in a header and you include it from multiple .cpp files you will get a copy for each. If you want a single copy, then mark it as extern (and not static) in the header, define it in a single translation unit and link that in either the executable or the library (but not both).
What compiler did you use on each of these platforms? The behavior you're describing for Linux would be what I'd expect, the static global is only local to that particular file at compile time.
You may be able to work around your issue using the GCC visibility attribute or the visibility pragma
I don't know about HPUX, but on Windows, if you have an exe and a DLL, and they each declare global variables, then there will be two distinct variables. If you are only getting a single variable then one image must be importing the variable from the other.
I'm trying to understand what happens when modules with globals and static variables are dynamically linked to an application.
By modules, I mean each project in a solution (I work a lot with visual studio!). These modules are either built into *.lib or *.dll or the *.exe itself.
I understand that the binary of an application contains global and static data of all the individual translation units (object files) in the data segment (and read only data segment if const).
What happens when this application uses a module A with load-time dynamic linking? I assume the DLL has a section for its globals and statics. Does the operating system load them? If so, where do they get loaded to?
And what happens when the application uses a module B with run-time dynamic linking?
If I have two modules in my application that both use A and B, are copies of A and B's globals created as mentioned below (if they are different processes)?
Do DLLs A and B get access to the applications globals?
(Please state your reasons as well)
Quoting from MSDN:
Variables that are declared as global in a DLL source code file are treated as global variables by the compiler and linker, but each process that loads a given DLL gets its own copy of that DLL's global variables. The scope of static variables is limited to the block in which the static variables are declared. As a result, each process has its own instance of the DLL global and static variables by default.
and from here:
When dynamically linking modules, it can be unclear whether different libraries have their own instances of globals or whether the globals are shared.
Thanks.
This is a pretty famous difference between Windows and Unix-like systems.
No matter what:
Each process has its own address space, meaning that there is never any memory being shared between processes (unless you use some inter-process communication library or extensions).
The One Definition Rule (ODR) still applies, meaning that you can only have one definition of the global variable visible at link-time (static or dynamic linking).
So, the key issue here is really visibility.
In all cases, static global variables (or functions) are never visible from outside a module (dll/so or executable). The C++ standard requires that these have internal linkage, meaning that they are not visible outside the translation unit (which becomes an object file) in which they are defined. So, that settles that issue.
Where it gets complicated is when you have extern global variables. Here, Windows and Unix-like systems are completely different.
In the case of Windows (.exe and .dll), the extern global variables are not part of the exported symbols. In other words, different modules are in no way aware of global variables defined in other modules. This means that you will get linker errors if you try, for example, to create an executable that is supposed to use an extern variable defined in a DLL, because this is not allowed. You would need to provide an object file (or static library) with a definition of that extern variable and link it statically with both the executable and the DLL, resulting in two distinct global variables (one belonging to the executable and one belonging to the DLL).
To actually export a global variable in Windows, you have to use a syntax similar to the function export/import syntax, i.e.:
#ifdef COMPILING_THE_DLL
#define MY_DLL_EXPORT extern "C" __declspec(dllexport)
#else
#define MY_DLL_EXPORT extern "C" __declspec(dllimport)
#endif
MY_DLL_EXPORT int my_global;
When you do that, the global variable is added to the list of exported symbols and can be linked like all the other functions.
In the case of Unix-like environments (like Linux), the dynamic libraries, called "shared objects" with extension .so export all extern global variables (or functions). In this case, if you do load-time linking from anywhere to a shared object file, then the global variables are shared, i.e., linked together as one. Basically, Unix-like systems are designed to make it so that there is virtually no difference between linking with a static or a dynamic library. Again, ODR applies across the board: an extern global variable will be shared across modules, meaning that it should have only one definition across all the modules loaded.
Finally, in both cases, for Windows or Unix-like systems, you can do run-time linking of the dynamic library, i.e., using either LoadLibrary() / GetProcAddress() / FreeLibrary() or dlopen() / dlsym() / dlclose(). In that case, you have to manually get a pointer to each of the symbols you wish to use, and that includes the global variables you wish to use. For global variables, you can use GetProcAddress() or dlsym() just the same as you do for functions, provided that the global variables are part of the exported symbol list (by the rules of the previous paragraphs).
And of course, as a necessary final note: global variables should be avoided. And I believe that the text you quoted (about things being "unclear") is referring exactly to the platform-specific differences that I just explained (dynamic libraries are not really defined by the C++ standard, this is platform-specific territory, meaning it is much less reliable / portable).
The answer left by Mikael Persson, although very thorough, contains a severe error (or at least misleading), in regards to the global variables, that needs to be cleared up. The original question asked if there were seperate copies of the global variables or if global variables were shared between the processes.
The true answer is the following: There are seperate (multiple) copies of the global variables for each process, and they are not shared between processes. Thus by stating the One Definition Rule (ODR) applies is also very misleading, it does not apply in the sense they are NOT the same globals used by each process, so in reality it is not "One Definition" between processes.
Also even though global variables are not "visible" to the process,..they are always easily "accesible" to the process, because any function could easily return a value of a global variable to the process, or for that matter, a process could set a value of a global variable through a function call. Thus this answer is also misleading.
In reality, "yes" the processes do have full "access" to the globals, at the very least through the funtion calls to the library. But to reiterate, each process has it's own copy of the globals, so it won't be the same globals that another process is using.
Thus the entire answer relating to external exporting of globals really is off topic, and unnecessary and not even related to the original question. Because the globals do not need extern to be accessed, the globals can always be accessed indirectly through function calls to the library.
The only part that is shared between the processes, of course, is the actual "code". The code only loaded in one place in physical memory (RAM), but that same physical memory location of course is mapped into the "local" virtual memory locations of each process.
To the contrary, a static library has a copy of the code for each process already baked into the executable (ELF, PE, etc.), and of course, like dynamic libraries has seperate globals for each process.
In unix systems:
It is to be noted , that the linker does not complain if two dynamic libraries export same global variables. but during execution a segfault might arise depending on access violations. A usual number exhibiting this behavior would be segmentation fault 15
segfault at xxxxxx ip xxxxxx sp xxxxxxx error 15 in a.out