Do dynamic libraries break C++ standard? - c++

The C++ standard 3.6.3 states
Destructors for initialized objects of static duration are called as a result of returning from main and as a result of calling exit
On windows you have FreeLibrary and linux you have dlclose to unload a dynamically linked library. And you can call these functions before returning from main.
A side effect of unloading a shared library is that all destructors for static objects defined in the library are run.
Does this mean it violates the C++ standard as these destructors have been run prematurely ?

It's a meaningless question. The C++ standard doesn't say what dlclose does or should do.
If the standard were to include a specification for dlclose, it would certainly point out that dlclose is an exception to 3.6.3. So then 3.6.3 wouldn't be violated because it would be a documented exception. But we can't know that, since it doesn't cover it.
What effect dlclose has on the guarantees in the C++ standard is simply outside the scope of that standard. Nothing dlclose can do can violate the C++ standard because the standard says nothing about it.
(If this were to happen without the program doing anything specific to invoke it, then you would have a reasonable argument that the standard is being violated.)

Parapura, it may be helpful to keep in mind that the C++ standard is a language definition that imposes constraints on how the compiler converts source code into object code.
The standard does not impose constraints on the operating system, hardware, or anything else.
If a user powers off his machine, is that a violation of the C++ standard? Of course not. Does the standard need to say "unless the user powers off the device" as an "exception" to every rule? That would be silly.
Similarly, if an operating system kills a process or forces the freeing of some system resources, or even allows a third party program to clobber your data structures -- this is not a violation of the C++ standard. It may well be a bug in the OS, but the C++ language definition remains intact.
The standard is only binding on compilers, and forces the resulting executable code to have certain properties. Nevertheless, it does not bind runtime behavior, which is why we spend so much time on exception handling.

I'm taking this to be a bit of an open-ended question.
I'd say it's like this: The standard only defines what a program is. And a program (a "hosted" one, I should add) is a collection of compiled and linked translation units that has a unique main entry point.
A shared library has no such thing, so it doesn't even constitute a "program" in the sense of the standard. It's just a bunch of linked executable code without any sort of "flow". If you use load-time linking, the library becomes part of the program, and all is as expected. But if you use runtime linking, the situation is different.
Therefore, you may like to view it like this: global variables in the runtime-linked shared object are essentially dynamic objects which are constructed by the dynamic loader, and which are destroyed when the library is unloaded. The fact that those objects are declared like global objects doesn't change that, since the objects aren't part of a "program" at that point.

They are only run prematurely if you go to great effort to do so - the default behavior is standard conforming.

If it does violate the standard, who is the violator? The C++ compiler cannot be considered the violator (since things are being loaded dynamically via a library call); thus it must the the vendor of the dynamic loading functionality, aka the OS vendor. Are OS vendors bound by the C++ standard when designing their systems? That definitely seems to be outside of the scope of the standard.

Or for another perspective, consider the library itself to be a separate program providing some sort of service. When this program is terminated (by whatever means the library is unloaded) then all associated service objects should disappear as well, static or not.

This is just one of the tons and tons of platform-specific "extensions" (for a target compiler, architecture, OS, etc) that are available. All of which "violate" the standard in all sorts of ways. But there is only one expected consequence for deviating from standard C++: you aren't portable anymore. (Unless you do a lot of #ifdef or something, but still, that particular code is locked in to that platform).
Since there is currently no standard/cross-platform notion of libraries, if you want the feature, you have to either not use it or re-implement it per-platform. Since similar things are appearing on most platforms, maybe the standard will one day find a clean way to abstract them so that the standard covers them. The advantage will be a cross-platform solution and it will simplify cross platform code.

Related

Can ODR violation be avoided by using hidden visibility?

So for example, I have a slightly complicated case of library dependency in one of my projects:
/--------------------------------\
| |
/----> GRPC <------------------\ |
| | |
| (c++) | |
\------- A --------------> B |
| (rust) |
| |
\------------------> c++ <---/
Rust by default will prefer to use static linkage. Executable A is also built to statically link lib(std)c++. So, to my understanding, there will be two copies of STL implementation in both A and B. This is exactly the pattern that https://developer.android.com/ndk/guides/cpp-support#sr suggests avoiding.
However, looking through the dynamic linkage table (via nm -D, for example) of B; I could see no exported lib(std)c++/grpc symbol. This is because rust marks them hidden by default.
So, is it safe (or conforming the ODR) if all common symbols in B are hidden?
conforming the ODR
One-definition rule is part of C++ programming language. It's relevant to C++ language. It's irrelevant to anything else. There is no ODR "outside" of C++. C++ standard does not apply outside of C++, it's only about C++ programming language.
There is no wildly adopted portable super-standard concerning language interoperability. These are just tools, there are no definitions and rules. ODR or any other rule from C++ does not apply here.
Trying to apply C++ standard rules to unrelated contexts makes little sense. Rust is not part of C++. RPC is platform specific, outside the scope of C++ programming language.
Ergo, reasoning that some C++ rule will or will not be broken in a work chain that uses platform specific tools - a shared library, "dynamic linkage table" - and uses multiple programming language just doesn't apply here.
In the sense of C++, this all is literally "undefined behavior" - there are no rules from C++ standard that could apply here.
Can ODR violation be avoided by using hidden visibility?
Sure.
The android doc says:
In this situation, the STL, including and global data and static constructors, will be present in both libraries. The runtime behavior of this application is undefined, and in practice crashes are very common. Other possible issues include:
Memory allocated in one library, and freed in the other, causing memory leakage or heap corruption.
You mentioned those symbols are hidden. However, is it equal to, the global data is not presented twice?
If the global data can be presented twice, then "Memory allocated in one library, and freed in the other, causing memory leakage or heap corruption" may happen. For example, if some memory is allocated in Rust, and we transfer it to C++, and C++ later frees it, then we are facing this situation. Of course your code may not have this case, but if some code in the future violates, or if one day you use a third party library which violates it, then we are in trouble (and seems to be hard to debug).
Indeed, what about letting C++ to statically link the Rust code? Then you get a giant .so file containing your C++, your Rust, your dependency, etc. However, even if you can do that, we may still need to be careful: Is the giant .so file the only one in your whole Android app? In other words, are you sure you do not have, and will never have any other native libraries? If not, IMHO we may still be facing the problem again.
Anyway I am not an expert in Android/C++. I was having a similar problem a few months ago (replace "C++ code" with "the Flutter engine which is written in C++" and so on) and made some workaround. So this post is not really an answer, but rather some (can be wrong) thoughts and suggestions. Hope someone can correct me!

Using shared libraries and mismatched compilers

What are the chances using shared libraries that are compiled using different compiler versions than your program would introduce problems?
What if the language standard your programs use is different from their?
i.e Could it be a problem if I link boost libraries compiled with gcc-4.8,c++11 while compiling my code with gcc-6,c++14, for instance?
If the ABI (and API) is the same, it will work fine, according to gcc.gnu.org's ABI Policy and Guidelines, "given an application compiled with a given compiler ABI and library API, it will work correctly with a Standard C++ Library created with the same constraints."
A shared library compiled with a different ABI could work, but there are some cases of which to be aware because they could cause major bugs which would be incredibly difficult to detect.
gcc-4.8 and gcc-6 have different ABIs (Application Binary Interfaces), and so could output different compiled code in very specific cases, and cause an application to break.
However, "the GNU C++ compiler, g++, has a compiler command-line option to switch between various different C++ ABIs." (According to ABI Policy and Guidelines.)
You can read more details about specific ABI issues from gcc.gnu.org:
Short answer, close to 100% that you'll have some issues unless the library was designed with this in mind. Boost was not designed with this in mind at all, it will not work.
Long answer, it might work under certain very specific circumstances (not for boost though). There are 2 main things that come into play:
ABI compatibility
Sub-library compatibility / inlined code
The ABI is the easy part. If e.g. compiler A mangles names differently than compiler B, it won't even link. Or if they have different calling conventions (e.g. how arguments are passed through registers/stack etc) then it might link but it will not work at all / crash in pretty obvious manners. Also most compilers on the same platform have the same calling conventions (or can be configured appropriately) so this shouldn't be a huge problem.
Sub-library compatibility and inlined code is trickier. For example, let's say that you have a library that passes out an allocated object, and it's the client's job to deallocate. If the library's allocator works differently from the one in the client, then this will cause problems (e.g. the library uses compiler A's new, and the main program uses compiler B's delete).
Or there might be code in the headers (e.g. inline methods). The two compilers might compile them differently, which will cause issues.
Or the library might return a std::vector. The implementations of compiler A's vector might differ from compiler B's vector, so that won't work either.
Or there might be a struct or class passed around. The two compilers might not pack/pad them the same way, so they won't be laid out the same way in memory, and things will break.
So if the library is designed with this in mind, then it might work. Generally speaking that means that:
All calls have to be extern C to avoid name mangling.
No passing around of any externally defined structures (e.g. the STL)
Even structs might cause issues unless the packing/padding is the same in both compilers
Everything allocated by the library must also be deallocated by the library
No throwing of exceptions (which is sort of implied by extern C)
Probably a few more that I'm forgetting right now

What can be changed in .so library without breaking compatibility

For example can I add new function in header file without the need to recompile all programs using that library?
You can add functions and objects to a shared library without breaking existing programs that rely on that library. Under some circumstances you could increase the size of objects (in particular, arrays) in the library.
You can also replace the implementation of a function, provided that the function signature does not change. This will not cause any problems with dynamic linking, but if the new implementation's behavior does not meet existing programs' expectations then you'll see program misbehavior.
You can remove functions and objects that no program links to. If you're concerned only with existing programs then you may be able to catalog what functions and objects those are, but otherwise you can only base such an evaluation on the visibility of the functions / objects within the shared library -- externally-visible functions and objects cannot safely be removed.
There may be other implementation-specific details of a shared library that can be changed without breaking compatibility as well.
Note, however, that none of that has anything directly to do with header files. Compatibility of shared libraries is primarily a run time consideration. Header files are relevant only at compile time.
Another point is that you have to be very careful of any shared structures. If a function in your library accepts or returns a structure or a pointer to a structure, and if you make any changes to that structure (adding, removing, or rearranging members), you're likely to introduce incompatibility.
(Strictly speaking, changes like this do count as changes to the function's signature, as mentioned by others.)
If you're very, very careful, you can arrange to add new members at the end of a structure, but it generally requires explicit cooperation by callers, using mechanisms defined in advance (that is, adhered to since version 0, by all calling code, before any changes get made).

Is dynamic loading strictly compatible with the C++ Standard?

Does the use of dynamic loading require any special precautions for code to be strictly legal C++?
The C++11 Standard refers to the order of certain events such as what goes on before the first call to main(). However, dynamic loading seems to pull the rug out from under typical assumptions regarding the ordering of events in a program.
As an example is a quote from ยง3.6.2.
Static initialization shall be performed before any dynamic initialization takes place.
In the case of dynamic loading, this seems a nearly impossible obligation if taken literally. A program may incur dynamic initialization and then dynamically load code. If that code contains variables that would normally have been statically initialized, the C++ Standard has been violated. It seems possible that the order of events mandated by the Standard could still appear to be satisfied and be legal by the "as-if" rule but elsewhere on SO others have warned about interpreting that rule too broadly.
The C++ standard doesn't have any provision for dynamic modules, so a certain amount of interpretation is necessary.
Yes, static-initialized variables in dynamically loaded modules will be initialized after dynamically initialized variables in the main module. You can observe this, and construct programs where it has an effect on the program's behavior. But if you think of the DLL as a separate program, one which shares the main program's memory space but has its own timeline, you can pretty much apply the same rules at the module level, and use them to predict behavior at the application-wide behavior. The compiler doesn't want to surprise you... it just has to sometimes.
Incidentally, initialization order is really the least of your concerns when it comes to the collision between C++ and DLLs. Dynamic modules break far more rules than that, particularly when it comes to RTTI.

Can a std::string be passed by value across DLL boundries?

Can a std::string be passed by value across DLL boundries between DLLs built with different version of Visual Studio?
No, because templated code is generated separately per module.
So, when your EXE instantiates a std::string and passes it to the DLL, the DLL will begin using a completely different implementation on it. The result is a total mess, but it often sorta almost works because implementations are very similar, or the mess is hard to detect because it's some kind of subtle heap corruption.
Even if they're both built with the same version of VS, it's very precarious / fragile, and I would not recommend it. Either use a C-style interface between modules (for example, COM), or just don't use a DLL.
More detailed explanation here: Creating c++ DLL without static methods
and here: How can I call a function of a C++ DLL that accepts a parameter of type stringstream from C#?
In general, you can not mix binary code built with different compilers, which includes different versions of the same compiler (and can even include the same compiler invoked with different commandline options), so the answer to what you are trying to do is a clear "No".
The reason is that different compilers might provide different implementations of std::string. For example, one implementation could have a fixed, static buffer, while another version doesn't, which already leads to different object sizes. There is a bunch of other things that can make the interfaces incompatible, like the underlying allocator, the internal representation. Some things will already fail to link, due to name mangling or different private APIs, which both protect you from doing something wrong.
Some notes:
Even if you didn't pass the object by value but by reference, the called code could have a different idea of what this objects looks like.
It also doesn't matter that the type is supplied by the compiler, even if you defined the class yourself and compiled two DLLs with different versions of the class definition you would have problems. Also, if you change your standard library implementation, you make your binaries incompatible, too.
It doesn't even matter that the other code is in a DLL, it also applies to code in the same executable or DLL, although headers and automatic recompilation on change make this case very unlikely.
Specifically for MS Windows, you also have a debug heap and a release heap, and memory allocated in one must not be returned to the other. For that reason, you often have two DLL, one with a 'd' suffix (the debug version) and one without. This is a case where the compiler settings already affect compatibility, but you can get around this using a parallel approach of providing two versions of your DLL, too.
To some degree, similar problems occur for C code, too, where compilers have to agree on e.g. struct layout and calling conventions. Due to greater age and lower complexity, different C compilers are effectively compatible though. This is also accepted as a necessary feature in C as opposed to C++.