Mixing C++ flavours in the same project - c++

Is it safe to mix C++98 and C++11 in the same project? By "mixing" I mean not only linking object files but also common header files included in the source code compiled with C++98 and C++11.
The background for the question is the desire to transition at least a part of a large code base to C++11. A part of the code is in C++ CUDA, compiled to be executed on either GPU or CPU, and the corresponding compiler doesn't support C++11 at this time. However, much of the code is intended for CPU only and can be compiled with either C++ flavour. Some header files are included in both CPU+GPU and CPU-only source files.
If we now compile CPU-only source files with C++11 compiler, can we be confident against undesirable side effects?

In practice, maybe.
It is relatively common for the standard library of C++11 and C++03 to disagree about what the layout of std namespace objects is. As an example, sizeof(std::vector<int>) changed noticeably over various compiler versions in MSVC land. (it got smaller as they optimized it)
Other examples could be a different heap on each side of the compiler fence.
So you have to carefully "firewall" between the two source trees.
Now, some compilers seek to minimize such binary compatibility changes, even at the cost of violating the standard. I believe std::list without a size counter might be an example of that (which violates C++11, but I recall that at least one vendor provided a standards-non-compliant std::list to maintain binary compatibility -- I don't remember which one).
For the two compilers (and a compiler in C++03 and C++11 are different compilers) you are going to have some ABI guarantees. There is probably a large chunk of the language for which the ABI will agree, and on that set you are relatively safe.
To be reasonably safe, you'll want to treat the other compiler version files as if they are third party DLLs (delay loaded libraries) that do not link to the same C++ standard library. That means any resources passed from one to the other have to be packaged with destruction code (ie, returned to the DLL from whence it came to be destroyed). You'll either have to investigate the ABI of the two standard libraries, or avoid using it in the common header files, so you can pass things like smart pointers between the DLLs.
An even safer approach is to strip yourself down to a C style interface with the other code base, and only pass handles (opaque types) between the two code bases. To make this sane, whip up some header-file only mojo that wraps the C style interface in pretty C++ code, just don't pass those C++ objects between the code bases.
All of this is a pain.
For example, suppose you have a std::string get_some_string(HANDLE) function, and you don't trust ABI stability.
So you have 3 layers.
namespace internal {
// NOT exported from DLL
std::string get_some_string(HANDLE) { /* implementation in DLL */ }
}
namespace marshal {
// exported from DLL
// visible in external headers, not intended to be called directly
void get_some_string(HANDLE h, void* pdata, void(*callback)( void*, char const* data, std::size_t length ) ) {
// implementation in DLL
auto r = ::internal::get_some_string(h);
callback( pdata, r.data(), r.size() );
}
}
namespace interface {
// exists in only public header file, not within DLL
inline std::string get_some_string(HANDLE h) {
std::string r;
::marshal::get_some_string(h, &r,
[](void* pr, const char* str, std::size_t length){
std::string& r = *static_cast<std::string*>(pr);
r.append( str, length );
}
);
return r;
}
}
So the code outside the DLL does an auto s = ::interface::get_some_string(handle);, and it looks like a C++ interface.
The code inside the DLL implements std::string ::internal::get_some_string(HANDLE);.
The marshal's get_some_string provides a C-style interface between the two, which provides better binary compatibility than relying on the layout and implementation of std::string to remain stable between the DLL and the code using the DLL.
The interface's std::string exists completely within the non-DLL code. The internal std::string exists completely within the DLL-code. The marshal code moves the data from one side to the other.

Related

Problem passing std::string though DLL boundaries with visual studio

I have a DLL that contains some code like this:
class Info {
int a
int b
int c
std::string str1;
std::string str2;
};
__declspec(dllexport) class C {
Info getInfo();
}
I compile this code with Visual Studio 2015, and calling C::getInfo() works perfectly fine on my system.
Now somebody else is trying to use the DLL, and he's also using Visual Studo 2015. The call goes well but the strings don't contain any accessible data. Even the debugger reports "Error reading characters of string", and the program crashes when trying to read it.
There was also a similar problem with std::vector which I could solve by inlining the code that created the vector.
To me this looks like either the other person is using a different STL version, or the compiler somehow produces a different memory layout for std::string (though he claims to use default settings).
Is this possible? I saw there are different updates for VS 2015 and we might not have the same updates installed.
Any recommendations for fixing this? I can't move to char* without significantly breaking the API. Most users don't even use Visual Studio (or Windows), so this hasn't been a problem until now.
That's life I'm afraid. Unless you have exactly the same compiler (including C++ Standard Library) and compiler settings, this is not guaranteed to work. I'm not convinced from a C++ Standards perspective that even this is guaranteed to work. It's pretty nigh on impossible to build an ABI in C++. In your specific example, there's no guarantee that the struct will be built with the same alignment characteristics. sizeof(std::string) varies considerably due in part to short string optimisations which can even be compiler-configurable.
Alternatives:
Use C linkage for your exported functions and char*-style interfaces that are allocated and deallocated by the caller (cf. the Windows API). (Don't ever allocate memory in the client and deallocate in the dll, and vice-versa.)
Make sure the user of your library can build it from the source files. Consider distributing as a static library rather than a dynamic one, as the former are a little easier to work with.

How symbols are resolved when linking in dynamic libraries

Something I still haven't quite absorbed. Nearly all of my development has involved generating and compiling my code together statically, without the use of linking dynamic libraries (dll or .so).
I understand in this case how the compiler and linker can resolve the symbols as it is stepping through the code.
However, when linking in a dynamic library for example, and the dynamic library has been compiled with a different compiler than the main code, will the symbol names not be different?
For example, I define a structure called RequiredData in my main code as shown below. Imagine that the FuncFromDynamicLib() function is part of a separate dynamic library and it takes this same structure RequiredData as an argument. It means that somewhere in the dynamic library, this structure must be defined again? How can this be resolved at the time of dynamic linking that these structures are the same. What if the data members are different? Thanks in advance.
#include <iostream>
using namespace std;
struct RequiredData
{
string name;
int value1;
int value2;
};
int FuncFromDynamicLib(RequiredData);
int main()
{
RequiredData data;
data.name = "test";
data.value1 = 120;
data.value2 = 200;
FuncFromDynamicLib(data);
return 0;
}
//------------Imagine that this func is Part of dynamic library in another file--------------------
int FuncFromDynamicLib(RequiredData rd)
{
cout<<rd.name<<endl;
cout<<rd.value1<<endl;
cout<<rd.value2<<endl;
}
//------------------------------------------------------
You are completely right that you cannot in general mix your tools. The process of "linking" is not standardized or specified, and so in general you have to use the same toolchain for the entire process, or indeed one part won't know how the other part called the symbols in the object code.
The problem is in fact worse: it's not just symbol names that must match, but also calling conventions: how to propagate function arguments, return values, exceptions, and how to handle RTTI and dynamic casts.
All these factors are summarized under the term "application binary interface", ABI for short. And the problem you describe can be summarized as "there is no standard ABI".
Several factors make the situation less dire:
For C code, every major platform as a de facto standard ABI which is followed by most tools. Therefore, C code is in practice highly portable and interoperable.
For C++ code, the ABI is much more complex than for C. However, for x86 and x86-64, two major platforms, the Itanium ABI is used on a wide variety of platforms, which creates at least a somewhat interoperable environment. In particular, that ABI's rules for name mangling (i.e. how to represent namespace-qualified names and function overload sets as flat strings) are well-known and have widespread tooling support.
For C++, there are further peculiar concerns. Shipping compiled code to customers is one thing, but another issue is that much of C++ library code lives in headers and gets compiled each time. So the problem is not just that you may be using different compilers, but also different library implementations. Internal details of class layout become relevant, because you can't just access vendor A's vector as if it were vendor B's vector. GCC has dubbed this "library ABI".

Function with bool return value, only set 1 byte of the entire register

I have the following piece of code which is a part of api (cdecl). In MSVC++ the sizeof bool is 1 byte, but since bool is implementation defined, some programs compiled by other compiler/the author incorrectly define function signature may treat bool as >1 byte and calling the check below may return true on their side of programs.
virtual bool isValid()
{
return false;
// ^ code above in asm: xor al, al
}
To avoid this, I put an inline asm, xor eax, eax before the return - but I feel it a bit hacky and it of course will not work on x64 due to lack of inline assembler support.
Using #define bool int will work but it is not I wanted, as I have structs that have bool datatype inside it and using this will causes corruption.
Is there anything like intrinsics that can zeroed the eax/rax register or anything that can solve this problem?
There's nothing that will do what you're asking for. Your problem needs a much different solution.
First any code that "incorrectly define function signature" is broken an needs to fixed. It's never the solution to work around it in other code.
Next your problem is like more than just bool being implementation defined, the C++ standard makes a whole host of things are implementation defined. So much so that two different C++ compilers are rarely have a compatible ABIs. If your code provides C++ interfaces for the use of code compiled by other people you'll probably need to produce separately compiled binaries, whether in the form of object files, static libraries, DLLs or executables, for each different compiler you want to support. In fact you may need to provide separate binaries for each version of each compiler.
There are two C++ compilers the try to be compatible with the Microsoft C++ ABI. The first is Intel's C++ compiler and the second is the Windows port of clang. The clang implementation is notably still a work in progress. You may still need to create separate versions for each version of the Microsoft C/C++ runtime libraries your code is compiled with.
You can potentially reduce the number of different versions of binaries that you need to distribute by providing a pure C interface to your code. A pure C interface means using only C data types and only functions declared as extern "C". While things like classes, member functions, templates, RTTI and exceptions can be used in your implementation the can't be used as part of your public interface. An exception are COM-like interfaces, classes with nothing but public pure virtual functions. Since C compilers for Windows all use essentially the same C ABI and support COM interfaces, compatibility issues are less likely to be an issue. However the bool type (actually the _Bool type in C) is probably not safe to use, since it's a relatively recent addition to the C language. Use int in your C interfaces instead.
Note that because of C/C++ runtime differences even if you all you want to do distribute compiled binaries for use with Microsoft's Visual C++ compiler you may still need to distribute versions for each version of the compiler. That's because each version comes with a different runtime implementation and which have data structures with incompatible internal layouts. You can't pass an STL container created in a function compiled by one version of Visual C++ to a function compiled with a different version. You can't allocate memory with malloc in an executable and free it in a DLL, if the executable and DLL use different versions of the C runtime.
Unfortunately unless you're willing to restrict your users to one particular compiler the easy solution to your problem that you're looking for may not exist. Note that this is a common solution used by programs that provide plugin support. Pugins need to be compiled the same version of the same compiler that compiled the executable.

Array of char or std::string for a public library?

my question is simple:
Should I use array of char eg:
char *buf, buf2[MAX_STRING_LENGTH]
etc or should I use std::string in a library that will be used by other programmers where they can use it on any SO and compiler of their choice?
Considering performance and portability...
from my point of view, std strings are easier and performance is equal or the difference is way too little to not use std:string, about portability I don't know. I guess as it is standard, there shouldn't be any compiler that compiles C++ without it, at least any important compiler.
EDIT:
The library will be compiled on 3 major OS and, theorically, distributed as a lib
Your thoughts?
ty,
Joe
Depends on how this library will be used in conjunction with client code. If it will be linked in dynamically and you have a set of APIs exposed for the client -- you are better off using null terminated byte strings (i.e. char *) and their wide-character counterparts. If you are talking about using them within your code, you certainly are free to use std::string. If it is going to be included in source form -- std::string works fine.
But if your library is shipped as DLL your users will have to use the same implementation of std::string. It won't be possible for them to use STLPort (or any other implementation) if your library was built using Microsoft STL.
As long as you are targetting pure C++ for your library, using std::string is fine and even desirable. However, doing that ties you to a particular implementation of C++ (the one used to build your library), and it can't be linked with other C++ implementations or other languages.
Often, it is highly desirable to give a library a C interface rather than a C++ one. That way its usable by any other language that provides a C foreign function interface (which is most of them). For a C interface, you need to use char *
I would recommend just using std::string. Besides if you want compatibility with libraries requiring C-style strings (for example, which uses a C compatible API), you can always just use the c_str() method of std::string.
In general, you will be better off using std::string, certainly for calls internal to your library.
For your API, it's dependent on what its purpose is. For internal use within your organization, an API that uses std::string will probably be fine. For external use you may wish to provide a C API, one which uses char*

How to limit the impact of implementation-dependent language features in C++?

The following is an excerpt from Bjarne Stroustrup's book, The C++ Programming Language:
Section 4.6:
Some of the aspects of C++’s fundamental types, such as the size of an int, are implementation- defined (§C.2). I point out these dependencies and often recommend avoiding them or taking steps to minimize their impact. Why should you bother? People who program on a variety of systems or use a variety of compilers care a lot because if they don’t, they are forced to waste time finding and fixing obscure bugs. People who claim they don’t care about portability usually do so because they use only a single system and feel they can afford the attitude that ‘‘the language is what my compiler implements.’’ This is a narrow and shortsighted view. If your program is a success, it is likely to be ported, so someone will have to find and fix problems related to implementation-dependent features. In addition, programs often need to be compiled with other compilers for the same system, and even a future release of your favorite compiler may do some things differently from the current one. It is far easier to know and limit the impact of implementation dependencies when a program is written than to try to untangle the mess afterwards.
It is relatively easy to limit the impact of implementation-dependent language features.
My question is: How to limit the impact of implementation-dependent language features? Please mention implementation-dependent language features then show how to limit their impact.
Few ideas:
Unfortunately you will have to use macros to avoid some platform specific or compiler specific issues. You can look at the headers of Boost libraries to see that it can quite easily get cumbersome, for example look at the files:
boost/config/compiler/gcc.hpp
boost/config/compiler/intel.hpp
boost/config/platform/linux.hpp
and so on
The integer types tend to be messy among different platforms, you will have to define your own typedefs or use something like Boost cstdint.hpp
If you decide to use any library, then do a check that the library is supported on the given platform
Use the libraries with good support and clearly documented platform support (for example Boost)
You can abstract yourself from some C++ implementation specific issues by relying heavily on libraries like Qt, which provide an "alternative" in sense of types and algorithms. They also attempt to make the coding in C++ more portable. Does it work? I'm not sure.
Not everything can be done with macros. Your build system will have to be able to detect the platform and the presence of certain libraries. Many would suggest autotools for project configuration, I on the other hand recommend CMake (rather nice language, no more M4)
endianness and alignment might be an issue if you do some low level meddling (i.e. reinterpret_cast and friends things alike (friends was a bad word in C++ context)).
throw in a lot of warning flags for the compiler, for gcc I would recommend at least -Wall -Wextra. But there is much more, see the documentation of the compiler or this question.
you have to watch out for everything that is implementation-defined and implementation-dependend. If you want the truth, only the truth, nothing but the truth, then go to ISO standard.
Well, the variable sizes one mentioned is a fairly well known issue, with the common workaround of providing typedeffed versions of the basic types that have well defined sizes (normally advertised in the typedef name). This is done use preprocessor macros to give different code-visibility on different platforms. E.g.:
#ifdef __WIN32__
typedef int int32;
typedef char char8;
//etc
#endif
#ifdef __MACOSX__
//different typedefs to produce same results
#endif
Other issues are normally solved in the same way too (i.e. using preprocessor tokens to perform conditional compilation)
The most obvious implementation dependency is size of integer types. There are many ways to handle this. The most obvious way is to use typedefs to create ints of the various sizes:
typedef signed short int16_t;
typedef unsigned short uint16_t;
The trick here is to pick a convention and stick to it. Which convention is the hard part: INT16, int16, int16_t, t_int16, Int16, etc. C99 has the stdint.h file which uses the int16_t style. If your compiler has this file, use it.
Similarly, you should be pedantic about using other standard defines such as size_t, time_t, etc.
The other trick is knowing when not to use these typedef. A loop control variable used to index an array, should just take raw int types so the compile will generate the best code for your processor. for (int32_t i = 0; i < x; ++i) could generate a lot of needless code on a 64-bite processor, just like using int16_t's would on a 32-bit processor.
A good solution is to use common headings that define typedeff'ed types as neccessary.
For example, including sys/types.h is an excellent way to deal with this, as is using portable libraries.
There are two approaches to this:
define your own types with a known size and use them instead of built-in types (like typedef int int32 #if-ed for various platforms)
use techniques which are not dependent on the type size
The first is very popular, however the second, when possible, usually results in a cleaner code. This includes:
do not assume pointer can be cast to int
do not assume you know the byte size of individual types, always use sizeof to check it
when saving data to files or transferring them across network, use techniques which are portable across changing data sizes (like saving/loading text files)
One recent example of this is writing code which can be compiled for both x86 and x64 platforms. The dangerous part here is pointer and size_t size - be prepared it can be 4 or 8 depending on platform, when casting or differencing pointer, cast never to int, use intptr_t and similar typedef-ed types instead.
One of the key ways of avoiding dependancy on particular data sizes is to read & write persistent data as text, not binary. If binary data must be used then all read/write operations must be centralised in a few methods and approaches like the typedefs already described here used.
A second rhing you can do is to enable all your your compilers warnings. for example, using the -pedantic flag with g++ will warn you of lots of potential portability problems.
If you're concerned about portability, things like the size of an int can be determined and dealt with without much difficulty. A lot of C++ compilers also support C99 features like the int types: int8_t, uint8_t, int16_t, uint32_t, etc. If yours doesn't support them natively, you can always include <cstdint> or <sys/types.h>, which, more often than not, has those typedefed. <limits.h> has these definitions for all the basic types.
The standard only guarantees the minimum size of a type, which you can always rely on: sizeof(char) < sizeof(short) <= sizeof(int) <= sizeof(long). char must be at least 8 bits. short and int must be at least 16 bits. long must be at least 32 bits.
Other things that might be implementation-defined include the ABI and name-mangling schemes (the behavior of export "C++" specifically), but unless you're working with more than one compiler, that's usually a non-issue.
The following is also an excerpt from Bjarne Stroustrup's book, The C++ Programming Language:
Section 10.4.9:
No implementation-independent guarantees are made about the order of construction of nonlocal objects in different compilation units. For example:
// file1.c:
Table tbl1;
// file2.c:
Table tbl2;
Whether tbl1 is constructed before tbl2 or vice versa is implementation-dependent. The order isn’t even guaranteed to be fixed in every particular implementation. Dynamic linking, or even a small change in the compilation process, can alter the sequence. The order of destruction is similarly implementation-dependent.
A programmer may ensure proper initialization by implementing the strategy that the implementations usually employ for local static objects: a first-time switch. For example:
class Zlib {
static bool initialized;
static void initialize() { /* initialize */ initialized = true; }
public:
// no constructor
void f()
{
if (initialized == false) initialize();
// ...
}
// ...
};
If there are many functions that need to test the first-time switch, this can be tedious, but it is often manageable. This technique relies on the fact that statically allocated objects without constructors are initialized to 0. The really difficult case is the one in which the first operation may be time-critical so that the overhead of testing and possible initialization can be serious. In that case, further trickery is required (§21.5.2).
An alternative approach for a simple object is to present it as a function (§9.4.1):
int& obj() { static int x = 0; return x; } // initialized upon first use
First-time switches do not handle every conceivable situation. For example, it is possible to create objects that refer to each other during construction. Such examples are best avoided. If such objects are necessary, they must be constructed carefully in stages.