Problem passing std::string though DLL boundaries with visual studio

Problem passing std::string though DLL boundaries with visual studio - c++

I have a DLL that contains some code like this:
class Info {
int a
int b
int c
std::string str1;
std::string str2;
};
__declspec(dllexport) class C {
Info getInfo();
}
I compile this code with Visual Studio 2015, and calling C::getInfo() works perfectly fine on my system.
Now somebody else is trying to use the DLL, and he's also using Visual Studo 2015. The call goes well but the strings don't contain any accessible data. Even the debugger reports "Error reading characters of string", and the program crashes when trying to read it.
There was also a similar problem with std::vector which I could solve by inlining the code that created the vector.
To me this looks like either the other person is using a different STL version, or the compiler somehow produces a different memory layout for std::string (though he claims to use default settings).
Is this possible? I saw there are different updates for VS 2015 and we might not have the same updates installed.
Any recommendations for fixing this? I can't move to char* without significantly breaking the API. Most users don't even use Visual Studio (or Windows), so this hasn't been a problem until now.

That's life I'm afraid. Unless you have exactly the same compiler (including C++ Standard Library) and compiler settings, this is not guaranteed to work. I'm not convinced from a C++ Standards perspective that even this is guaranteed to work. It's pretty nigh on impossible to build an ABI in C++. In your specific example, there's no guarantee that the struct will be built with the same alignment characteristics. sizeof(std::string) varies considerably due in part to short string optimisations which can even be compiler-configurable.
Alternatives:
Use C linkage for your exported functions and char*-style interfaces that are allocated and deallocated by the caller (cf. the Windows API). (Don't ever allocate memory in the client and deallocate in the dll, and vice-versa.)
Make sure the user of your library can build it from the source files. Consider distributing as a static library rather than a dynamic one, as the former are a little easier to work with.

Related

Function with bool return value, only set 1 byte of the entire register

I have the following piece of code which is a part of api (cdecl). In MSVC++ the sizeof bool is 1 byte, but since bool is implementation defined, some programs compiled by other compiler/the author incorrectly define function signature may treat bool as >1 byte and calling the check below may return true on their side of programs.
virtual bool isValid()
{
return false;
// ^ code above in asm: xor al, al
}
To avoid this, I put an inline asm, xor eax, eax before the return - but I feel it a bit hacky and it of course will not work on x64 due to lack of inline assembler support.
Using #define bool int will work but it is not I wanted, as I have structs that have bool datatype inside it and using this will causes corruption.
Is there anything like intrinsics that can zeroed the eax/rax register or anything that can solve this problem?

There's nothing that will do what you're asking for. Your problem needs a much different solution.
First any code that "incorrectly define function signature" is broken an needs to fixed. It's never the solution to work around it in other code.
Next your problem is like more than just bool being implementation defined, the C++ standard makes a whole host of things are implementation defined. So much so that two different C++ compilers are rarely have a compatible ABIs. If your code provides C++ interfaces for the use of code compiled by other people you'll probably need to produce separately compiled binaries, whether in the form of object files, static libraries, DLLs or executables, for each different compiler you want to support. In fact you may need to provide separate binaries for each version of each compiler.
There are two C++ compilers the try to be compatible with the Microsoft C++ ABI. The first is Intel's C++ compiler and the second is the Windows port of clang. The clang implementation is notably still a work in progress. You may still need to create separate versions for each version of the Microsoft C/C++ runtime libraries your code is compiled with.
You can potentially reduce the number of different versions of binaries that you need to distribute by providing a pure C interface to your code. A pure C interface means using only C data types and only functions declared as extern "C". While things like classes, member functions, templates, RTTI and exceptions can be used in your implementation the can't be used as part of your public interface. An exception are COM-like interfaces, classes with nothing but public pure virtual functions. Since C compilers for Windows all use essentially the same C ABI and support COM interfaces, compatibility issues are less likely to be an issue. However the bool type (actually the _Bool type in C) is probably not safe to use, since it's a relatively recent addition to the C language. Use int in your C interfaces instead.
Note that because of C/C++ runtime differences even if you all you want to do distribute compiled binaries for use with Microsoft's Visual C++ compiler you may still need to distribute versions for each version of the compiler. That's because each version comes with a different runtime implementation and which have data structures with incompatible internal layouts. You can't pass an STL container created in a function compiled by one version of Visual C++ to a function compiled with a different version. You can't allocate memory with malloc in an executable and free it in a DLL, if the executable and DLL use different versions of the C runtime.
Unfortunately unless you're willing to restrict your users to one particular compiler the easy solution to your problem that you're looking for may not exist. Note that this is a common solution used by programs that provide plugin support. Pugins need to be compiled the same version of the same compiler that compiled the executable.

Mixing C++ flavours in the same project

Is it safe to mix C++98 and C++11 in the same project? By "mixing" I mean not only linking object files but also common header files included in the source code compiled with C++98 and C++11.
The background for the question is the desire to transition at least a part of a large code base to C++11. A part of the code is in C++ CUDA, compiled to be executed on either GPU or CPU, and the corresponding compiler doesn't support C++11 at this time. However, much of the code is intended for CPU only and can be compiled with either C++ flavour. Some header files are included in both CPU+GPU and CPU-only source files.
If we now compile CPU-only source files with C++11 compiler, can we be confident against undesirable side effects?

In practice, maybe.
It is relatively common for the standard library of C++11 and C++03 to disagree about what the layout of std namespace objects is. As an example, sizeof(std::vector<int>) changed noticeably over various compiler versions in MSVC land. (it got smaller as they optimized it)
Other examples could be a different heap on each side of the compiler fence.
So you have to carefully "firewall" between the two source trees.
Now, some compilers seek to minimize such binary compatibility changes, even at the cost of violating the standard. I believe std::list without a size counter might be an example of that (which violates C++11, but I recall that at least one vendor provided a standards-non-compliant std::list to maintain binary compatibility -- I don't remember which one).
For the two compilers (and a compiler in C++03 and C++11 are different compilers) you are going to have some ABI guarantees. There is probably a large chunk of the language for which the ABI will agree, and on that set you are relatively safe.
To be reasonably safe, you'll want to treat the other compiler version files as if they are third party DLLs (delay loaded libraries) that do not link to the same C++ standard library. That means any resources passed from one to the other have to be packaged with destruction code (ie, returned to the DLL from whence it came to be destroyed). You'll either have to investigate the ABI of the two standard libraries, or avoid using it in the common header files, so you can pass things like smart pointers between the DLLs.
An even safer approach is to strip yourself down to a C style interface with the other code base, and only pass handles (opaque types) between the two code bases. To make this sane, whip up some header-file only mojo that wraps the C style interface in pretty C++ code, just don't pass those C++ objects between the code bases.
All of this is a pain.
For example, suppose you have a std::string get_some_string(HANDLE) function, and you don't trust ABI stability.
So you have 3 layers.
namespace internal {
// NOT exported from DLL
std::string get_some_string(HANDLE) { /* implementation in DLL */ }
}
namespace marshal {
// exported from DLL
// visible in external headers, not intended to be called directly
void get_some_string(HANDLE h, void* pdata, void(*callback)( void*, char const* data, std::size_t length ) ) {
// implementation in DLL
auto r = ::internal::get_some_string(h);
callback( pdata, r.data(), r.size() );
}
}
namespace interface {
// exists in only public header file, not within DLL
inline std::string get_some_string(HANDLE h) {
std::string r;
::marshal::get_some_string(h, &r,
[](void* pr, const char* str, std::size_t length){
std::string& r = *static_cast<std::string*>(pr);
r.append( str, length );
}
);
return r;
}
}
So the code outside the DLL does an auto s = ::interface::get_some_string(handle);, and it looks like a C++ interface.
The code inside the DLL implements std::string ::internal::get_some_string(HANDLE);.
The marshal's get_some_string provides a C-style interface between the two, which provides better binary compatibility than relying on the layout and implementation of std::string to remain stable between the DLL and the code using the DLL.
The interface's std::string exists completely within the non-DLL code. The internal std::string exists completely within the DLL-code. The marshal code moves the data from one side to the other.

Is there a portable wrapper for C++ type_info that standardizes type name string format?

The format of the output of type_info::name() is implementation specific.
namespace N { struct A; }
const N::A *a;
typeid(a).name(); // returns e.g. "const struct N::A" but compiler-specific
Has anyone written a wrapper that returns dependable, predictable type information that is the same across compilers. Multiple templated functions would allow user to get specific information about a type. So I might be able to use:
MyTypeInfo::name(a); // returns "const struct N::A *"
MyTypeInfo::base(a); // returns "A"
MyTypeInfo::pointer(a); // returns "*"
MyTypeInfo::nameSpace(a); // returns "N"
MyTypeInfo::cv(a); // returns "const"
These functions are just examples, someone with better knowledge of the C++ type system could probably design a better API. The one I'm interested in in base(). All functions would raise an exception if RTTI was disabled or an unsupported compiler was detected.
This seems like the sort of thing that Boost might implement, but I can't find it in there anywhere. Is there a portable library that does this?

There are some limitations to do such things in C++, so you probably won't find exactly what you want in the near future. The meta-information about the types that the compiler inserts in the compiled code is also implementation-specific to the RTL used by the compiler, so it'd be difficult for a third-party library to do a good job without relying to undocumented features of each specific compiler that might break in later versions.
The Qt framework has, to my knowledge, the nearest thing to what you intended. But they do that completely independent from RTTI. Instead, they have their own "compiler" that parses the source code and generates additional source modules with the meta-information. Then, you compile+link these modules along with your program and use their API to get the information. Take a look at http://doc.qt.nokia.com/latest/metaobjects.html

Jeremy Pack (from Boost Extension plugin framework) appears to have written such a thing:
http://blog.redshoelace.com/2009/06/resource-management-across-dll.html
3. RTTI does not always function as expected across DLL boundaries. Check out the type_info classes to see how I deal with that.
So you could have a look there.
PS. I remembered because I once fixed a bug in that area; this might still add information so here's the link: https://stackoverflow.com/a/5838527/85371

GCC has __cxa_demangle https://gcc.gnu.org/onlinedocs/libstdc++/manual/ext_demangling.html
If there are such extensions for all compilers you target, you could use them to write a portable function with macros to detect the compiler.

What is the cost of compiling a C program with a C++ compiler?

I want to use C with templates on a embedded environment and I wanted to know what is the cost of compiling a C program with a C++ compiler?
I'm interested in knowing if there will be more code than the one the C compiler will generate.
Note that as the program is a C program, is expect to call the C++ compiler without exception and RTTI support.
Thanks,
Vicente

The C++ compiler may take longer to compile the code (since it has to build data structures for overload resolution, it can't know ahead of time that the program doesn't use overloads), but the resulting binary should be quite similar.
Actually, one important optimization difference is that C++ follows strict aliasing rules by default, while C requires the restrict keyword to enable aliasing optimizations. This isn't likely to affect code size much, but it could affect correctness and performance significantly.

There's probably no 'cost', assuming that the two compilers are of equivalent quality. The traditional objection to this is that C++ is much more complex and so it's more likely that a C++ compiler will have bugs in it.
Realistically, this is much less of a problem that it used to be, and I tend to do most of my embedded stuff now as a sort of horrible C/C++ hybrid - taking advantage of stronger typing and easier variable declaration rules, without incurring RTTI or exception handling overheads. If you're taking a given compiler (GCC, etc) and switching it from C to C++ mode, then much of what you have to worry about is common to the two languages anyway.

The only way to really know is for you to try it with the compilers you care about. A quick experiment here on a trivial program shows that the output is the same.

Your program will be linked to the C++ runtime library, not the C one. The C++ is larger as well.
Also, there are a couple of differences between C and C++ (aliases were already pointed out) so it may happen that your C code just does not compile in C++.

If it's C, then you can expect it will be exactly the same.
To elaborate: both C and C++ will forward their parse tree into the same backend that generates code (possibly via another intermediate representation), which means that if the code is functionally identical, the output will look the same (or nearly so).
Templates do "inflate" code, but you would otherwise have to write the same code or use macros to the same effect, so this is no "extra cost". Contrarily, the compiler may be able to optimize templates better in some cases.

A C++ compiler cannot compile C code. It can only compile C++, including a very ugly language which is the intersection of C and C++ and the worst of both worlds. Some C code will fail to compile at all on a C++ compiler, for example:
char *s = malloc(len+1);
While other C code will be compiled to the wrong thing, for example:
sizeof 'a'

I have found this extra-ordinary document Technical Report on C++ Performance. I have found there all the answers i was looking for.
Thanks to all that have answered this question.

There will be more code because that is what templates do. They are a stencil for generating (more) code.
Otherwise, you should see no differences between compiling a C program with a C compiler versus compiling with a C++ compiler.
If you don't use any of the extra "features" there should be no difference in size or behavior of the end result.

Although the C code will likely compile to something very similar (assuming there's no exception support enabled), using templates can very rapidly result in large binaries - you have to be careful, because every template instantiation can recursively result in other templates being implicitly instantiated as well.

There was a time when the C++ compiler linked in a bunch of C++ stuff even if the program didnt use it and you would see binaries that were 10 to 100 times larger than the C compiler would produce. I think a lot of that has gone away.

Since this is tagged "embedded", I assume its for embedded systems?
In that case, the major difference between C and C++ is the way C++ treats structs. All structs will be treated like classes, meaning they will have constructors.
All instances of structs/classes declared at file scope or as static will then have their constructors called before main() is executed, in a similar manner to static initialization, which you already have there no matter C or C++.
All these constructor calls at bootup is a major disadvantage in efficiency for embedded systems, where the code resides in NVM and not in RAM. Just like static initialization, it will create an ugly, undesired workload peak at the start of the program, where values from NVM are copied into the RAM.
There are ways around the static initialization in C/C++: most embedded compilers have an option to disable it. But since that is a non-standard setup, all code using statics would then have to be written so that it never uses any initialization values, but instead sets all static variables in runtime.
But as far as I know, there is no way around calling constructors, without violating the standard.
EDIT:
Here is source code executed in one such C++ system, Freescale HCS08 Codewarrior 6.3. This code is injected in the user program after static initialization, but before main() is executed:
static void Call_Constructors(void) {
int i;
...
i = (int)(_startupData.nofInitBodies - 1);
while (i >= 0) {
(&_startupData.initBodies->initFunc)[i](); /* call C++ constructors */
i--;
}
...
At the very least, this overhead code must be executed at program startup, no matter how efficient the compiler is at converting constructors into static initializtion.

C++ runtime start-up differs slightly from C start-up because it must invoke the constructors for global static objects before main() is called. This call loop is trivial and should not add much.
In the case of C++ code that is also entirely C compilable no static constructors will be present so the loop will not iterate.
In most cases apart from that, you will normally see no significant difference, in C++ you only pay for what you use.

Why doesn't anyone upgrade their C compiler with advanced features?

struct elem
{
int i;
char k;
};
elem user; // compile error!
struct elem user; // this is correct
In the above piece of code we are getting an error for the first declaration. But this error doesn't occur with a C++ compiler. In C++ we don't need to use the keyword struct again and again.
So why doesn't anyone update their C compiler, so that we can use structure without the keyword as in C++ ?
Why doesn't the C compiler developer remove some of the glitches of C, like the one above, and update with some advanced features without damaging the original concept of C?
Why it is the same old compiler not updated from 1970's ?
Look at visual studio etc.. It is frequently updated with new releases and for every new release we have to learn some new function usage (even though it is a problem we can cope up with it). We will also get updated with the new compiler if there is any.
Don't take this as a silly question. Why it is not possible? It could be developed without any incompatibility issues (without affecting the code that was developed on the present / old compiler)
Ok, lets develop the new C language, C+, which is in between C and C++ which removes all glitches of C and adds some advanced features from C++ while keeping it useful for specific applications like system level applications, embedded systems etc.

Because it takes years for a new Standard to evolve.
They are working on a new C++ Standard (C++0x), and also on a new C standard (C1x), but if you remember that it usually takes between 5 and 10 years for each iteration, i don't expect to see it before 2010 or so.
Also, just like in any democracy, there are compromises in a Standard. You got the hardliners who say "If you want all that fancy syntactic sugar, go for a toy language like Java or C# that takes you by the hand and even buys you a lollipop", whereas others say "The language needs to be easier and less error-prone to survive in these days or rapidly reducing development cycles".
Both sides are partially right, so standardization is a very long battle that takes years and will lead to many compromises. That applies to everything where multiple big parties are involved, it's not just limited to C/C++.

typedef struct
{
int i;
char k;
} elem;
elem user;
will work nicely. as other said, it's about standard -- when you implement this in VS2008, you can't use it in GCC and when you implement this even in GCC, you certainly not compile in something else. Method above will work everywhere.
On the other side -- when we have C99 standard with bool type, declarations in a for() cycle and in the middle of blocks -- why not this feature as well?

First and foremost, compilers need to support the standard. That's true even if the standard seems awkward in hindsight. Second, compiler vendors do add extensions. For example, many compilers support this:
(char *) p += 100;
to move a pointer by 100 bytes instead of 100 of whatever type p is a pointer to. Strictly speaking that's non-standard because the cast removes the lvalue-ness of p.
The problem with non-standard extensions is that you can't count on them. That's a big problem if you ever want to switch compilers, make your code portable, or use third-party tools.
C is largely a victim of its own success. One of the main reasons to use C is portability. There are C compilers for virtually every hardware platform and OS in existence. If you want to be able to run your code anywhere you write it in C. This creates enormous inertia. It's almost impossible to change anything without sacrificing one of the best things about using the language in the first place.
The result for software developers is that you may need to write to the lowest common denominator, typically ANSI C (C89). For example: Parrot, the virtual machine that will run the next version of Perl, is being written in ANSI C. Perl6 will have an enormously powerful and expressive syntax with some mind-bending concepts baked right into the language. The implementation, though, is being built using a language that is almost the complete opposite. The reason is that this will make it possible for perl to run anywhere: PCs, Macs, Windows, Linux, Unix, VAX, BSD...

This "feature" will never be adopted by future C standards for one reason only: it would badly break backward compatibility. In C, struct tags have separate namespaces to normal identifiers, and this may or may not be considered a feature. Thus, this fragment:
struct elem
{
int foo;
};
int elem;
Is perfectly fine in C, because these two elems are in separate namespaces. If a future standard allowed you to declare a struct elem without a struct qualifier or appropriate typedef, the above program would fail because elem is being used as an identifier for an int.
An example where a future C standard does in fact break backward compatibiity is when C99 disallowed a function without an explicit return type, ie:
foo(void); /* declare a function foo that takes no parameters and returns an int */
This is illegal in C99. However, it is trivial to make this C99 compliant just by adding an int return type. It is not so trivial to "fix" C programs if suddenly struct tags didn't have a separate namespace.

I've found that when I've implemented non-standard extensions to C and C++, even when people request them, they do not get used. The C and C++ world definitely revolves around strict standard compliance. Many of these extensions and improvements have found fertile ground in the D programming language.
Walter Bright, Digital Mars

Most people still using C use it because they're either:
Targeting a very specific platform (ie, embedded) and therefore must use the compiler provided by that platform vendor
Concerned about portability, in which case a non-standard compiler would defeat the purpose
Very comfortable with plain C and see no reason to change, in which case they just don't want to.

As already mentioned, C has a standard that needs to be adhered to. But can't you just write your code using slightly modified C syntax, but use a C++ compiler so that things like
struct elem
{
int i;
char k;
};
elem user;
will compile?

Actually, many C compilers do add features - doesn't pretty much every C compiler support C++ style // comments?
Most of the features added to updates of the C standard (C99 being the most recent) come from extensions that 'caught on'.
For example, even though the compiler I'm using right now on an embedded platform does not claim to conform to the C99 standard (and it is missing quite a bit from it), it does add the following extensions (all of which are borrowed from C++ or C99) to it's 'C90' support:
declarations mixed with statements
anonymous structs and unions
inline
declaration in the for loop initialization expression
and, of course, C++ style // comments
The problem I run into with this is that when I try to compile those files using MSVC (either for testing or because the code is useful on more than just the embedded platform), it'll choke on most of them (I'm honestly not sure about anonymous structs/unions).
So, extensions do get added to C compilers, it's just that they're done at different rates and in different ways (so code using them becomes more difficult to port) and the process of moving them into a standard occurs at a near glacial pace.

We have a typedef for exactly this purpose.
And please do not change the standard we have enough compatibility problems already....
# Manoj Doubts comment
I have no problem with you or somebody else to define C+ or C- or Cwhatever unless you don't touch C :)
I still need a language that capable to complete my task - have a same piece of code (not a small one) to be able to run on tens of Operating system compiled by significant number of different compilers and be able to run on tens of different hardware platform at the moment there is only one language that allow me complete my task and i prefer not to experiment with this ability :) Especially for reason you provided. Do you really think that ability to write
foo test;
instead
struct foo test;
will make you code better from any point of view ?

The following program outputs "1" when compiled as standard C or something else, probably 2, when compiled as C++ or your suggested syntax. That's why the C language can't make this change, it would give new meaning to existing code. And that's bad!
#include <stdio.h>
typedef struct
{
int a;
int b;
} X;
int main(void)
{
union X
{
int a;
int b;
};
X x;
x.a = 1;
x.b = 2;
printf("%d\n", x.a);
return 0;
}

Because C is Standardized. Compiler could offer that feature and some do, but using it means that the source code doesn't follow the standard and could only be compiled on that vendor's compiler.

Well,
1 - None of the compilers that are in use today are from the 70s...
2 - There are standarts for both C and C++ languages and compilers are developed according to those standarts. They can't just change some behaviour !
3 - What happens if you develop on VS2008 and then try to compile that code by another compiler whose last version was released 10 years ago ?
4 - What happens when you play with the options on the C/C++ / Language tab ?
5 - Why don't Microsoft compilers target all the possible processors ? They only target x86, x86_64 and Itanium, that's all...
6 - Believe me , this is not even considered as a problem !!!

You don't need to develop a new language if you want to use C with C++ typedefs and the like (but without classes, templates etc).
Just write your C-like code and use the C++ compiler.

As far as new functionality in new releases go, Visual C++ is not completely standard-conforming (see http://msdn.microsoft.com/en-us/library/x84h5b78.aspx), By the time Visual Studio 2010 is out, the next C++ standard will likely have been approved, giving the VC++ team more functionality to change.
There are also changes to the Microsoft libraries (which have little or nothing to do with the standard), and to what the compiler puts out (C++/CLI). There's plenty of room for changes without trying to deviate from the standard.
Nor do you need anything like C+. Just write in C, use whatever C++ features you like, and compile as C++. One of the Bjarne Stroustrup's original design goals for C++ was to make it unnecessary to write anything in C. It should compile perfectly efficiently provided you limit the C++ features you use (and even then will compile very efficiently; modern C++ compilers do a very good job).
And the unanswered question: Why would you want to use non-standard C, when you could write standard C or standard C++ with almost equal facility?

This sounds like the embrace and extend concept.
Life under your scenario.
I develop code using a C compiler that has the C "glitches" removed.
I move to a different platform with another C compiler that has the C "glitches" removed, but in a slightly different way.
My code doesn't compile or runs differently on the new platform, I waste time "porting" my code to the new platform.
Some vendors actually like to fix "glitches" because this tends to lock people into a single platform.

If you want to write in standard C, follow the standards. That's it.
If you want more freedom use C# or C++.NET or anything else your hardware supports.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js