Safely use containers in C++ library interface

Safely use containers in C++ library interface - c++

When designing a C++ library, I read it is bad practice to include standard library containers like std::vector in the public interface (see e.g. Implications of using std::vector in a dll exported function).
What if I want to expose a function that takes or returns a list of objects? I could use a simple array, but then I would have to add a count parameter, which makes the interface more cumbersome and less safe. Also it wouldn't help much if I wanted to use a map, for example. I guess libraries like Qt define their own containers which are safe to export, but I'd rather not add Qt as a dependency, and I don't want to roll my own containers.
What's the best practice to deal with containers in the library interface? Is there maybe a tiny container implementation (preferably just one or two files I can drop in, with a permissive license) that I can use as "glue"? Or is there even a way to make std::vector etc. safe across .DLL/.so boundaries and with different compilers?

You can implement a template function. This has two advantages:
It lets your users decide what sorts of containers they want to use with your interface.
It frees you from having to worry about ABI compatibility, because there is no code in your library, it will be instantiated when the user invokes the function.
For example, put this in your header file:
template <typename Iterator>
void foo(Iterator begin, Iterator end)
{
for (Iterator it = begin; it != end; ++it)
bar(*it); // a function in your library, whose ABI doesn't depend on any container
}
Then your users can invoke foo with any container type, even ones they invented that you don't know about.
One downside is that you'll need to expose the implementation code, at least for foo.
Edit: you also said you might want to return a container. Consider alternatives like a callback function, as in the gold old days in C:
typedef bool(*Callback)(int value, void* userData);
void getElements(Callback cb, void* userData) // implementation in .cpp file, not header
{
for (int value : internalContainer)
if (!cb(value, userData))
break;
}
That's a pretty old school "C" way, but it gives you a stable interface and is pretty usable by basically any caller (even actual C code with minor changes). The two quirks are the void* userData to let the user jam some context in there (say if they want to invoke a member function) and the bool return type to let the callback tell you to stop. You can make the callback a lot fancier with std::function or whatever, but that might defeat some of your other goals.

Actually this is not only true for STL containers but applies to pretty much any C++ type (in particular also all other standard library types).
Since the ABI is not standardized you can run into all kinds of trouble. Usually you have to provide separate binaries for each supported compiler version to make it work. The only way to get a truly portable DLL is to stick with a plain C interface. This usually leads to something like COM, since you have to ensure that all allocations and matching deallocations happen in the same module and that no details of the actual object layout are exposed to the user.

TL;DR There is no issue if you distribute either the source code or compiled binaries for the various supported sets of (ABI + Standard Library implementation).
In general, the latter is seen as cumbersome (with reasons), thus the guideline.
I trust hand-waving guidelines about as far as I can throw them... and I encourage you to do the same.
This guidelines originates from an issue with ABI compatibility: the ABI is a complex set of specifications that defines the exact interface of a compiled library. It is includes notably:
the memory layout of structures
the name mangling of functions
the calling conventions of functions
the handling of exception, runtime type information, ...
...
For more details, check for example the Itanium ABI. Contrary to C which has a very simple ABI, C++ has a much more complicated surface area... and therefore many different ABIs were created for it.
On top of ABI compatibility, there is also an issue with Standard Library Implementation. Most compilers come with their own implementation of the Standard Library, and these implementations are incompatible with each others (they do not, for example, represent a std::vector the same way, even though all implement the same interface and guarantees).
As a result, a compiled binary (executable or library) may only be mixed and matched with another compiled binary if both were compiled against the same ABI and with compatible versions of a Standard Library implementation.
Cheers: no issue if you distribute source code and let the client compile.

If you are using C++11, you can use cppcomponents. https://github.com/jbandela/cppcomponents
This will allow you to use among other things std::vector as a parameter or return value across Dll/or .so files created using different compilers or standard libraries. Take a look at my answer to a similar question for an example Passing reference to STL vector over dll boundary
Note for the example, you need to add a CPPCOMPONENTS_REGISTER(ImplementFiles) after the CPPCOMPONENTS_DEFINE_FACTORY() statement

Related

Can I use Eigen3 from Rust? [duplicate]

I want to call a C++ dynamic library (*.so) from Rust, but I don't want to build it from Rust. Like this,
cc::Build::new()
    .file("src/foo.cc")
.shared_flag(true)
.compile("libfoo.so");
In some cases, I only need to call several functions, not all the functions. How can I use it?

Before you go further, make sure you have a basic idea of Rust FFI (foreign function interface).
In Rust, it's easy to call C, but hard to call C++.
To call C functions in Rust, you just have to wrap them with extern, do some basic type casting and sometimes unsafe.
To call C++ functions, since Rust does not have built-in knowledge of C++ features, you may have to do a lot of manual translation. For example, here is part of the documentation from Rust-Qt:
Many things are directly translated from C++ to Rust:
Primitive types are mapped to Rust's primitive types (like bool) and types provided by libc crate (like libc::c_int).
Fixed-size numeric types (e.g int8_t or qint8) are mapped to Rust's fixed size types (e.g. i8).
Pointers, references and values are mapped to Rust's respective types.
C++ namespaces are mapped to Rust submodules.
C++ classes and structs are mapped to Rust structs. This also applies to all instantiations of template classes encountered in the
library's API, including template classes of dependencies.
Free functions are mapped to free functions.
Class methods are mapped to structs' implementations.
Destructors are mapped to Drop and CppDeletable implementations.
Function pointer types are mapped to Rust's equivalent representation. Function pointers with references or class values are
not supported.
static_cast and dynamic_cast are available in Rust through corresponding traits.
Names of Rust identifiers are modified according to Rust's naming
conventions.
When direct translation is not possible:
Contents of each include file of the C++ library are placed into a separate submodule.
Method overloading is emulated with wrapping arguments in a tuple and creating a trait describing tuples acceptable by each method.
Methods with default arguments are treated in the same way.
Single inheritance is translated to Deref and DerefMut implementation, allowing to call base class methods on derived
objects. When deref coercions are not enough, static_cast should be
used to convert from derived to base class.
Getter and setter methods are created for each public class field.
Not implemented yet but planned:
Translate C++ typedefs to Rust type aliases.
Implement operator traits for structs based on C++ operator methods (issue). Operators
are currently exposed as regular functions with op_ prefix.
Implement Debug and Display traits for structs if applicable methods exist on C++ side.
Implement iterator traits for collections.
Subclassing API (issue).
Provide access to a class's public variables (issue).
Provide conversion from enums to int and back (used in Qt API).
Support C++ types nested into template types, like Class1<T>::Class2.
Not planned to support:
Advanced template usage, like types with integer template arguments.
Template partial specializations.
Template methods and functions.
My suggestion is to wrap your C++ library as a C library, then call it the official FFI way, or use rust-bindgen to automatically do the wrapping.
If you still want to call C++ in Rust, rustcxx seems like a handy tool.
As to the library linking, it's pretty simple:
Put the library into your system library searching paths like /usr/lib or /usr/local/lib/, make sure it can be found by ldconfig -p.
Or use the environment variable LD_LIBRARY_PATH to specify the path where your library lays when you run cargo from the CLI.

According to Rust's official website, there is no official support for linkage with C++. Instead you can try and use C libraries.
There is also a thread on this issue in their users forum, where users suggest some 3rd party projects aiming to tackle this issue:
bindgen - Auto generation of FFI for Rust
cpp-to-rust - Allows to use C++ libraries from Rust. The main target of this project is Qt.
I didn't use them, so I can't recommend anything particular or share my experience, but good luck :)

I feel it's worth mentioning as of now there is CXX library in crate
https://docs.rs/cxx/latest/cxx/
https://cxx.rs/
This is what I will be using going forward in order to interoperate with a C++ Library from a vendor.
The assigned answer has a link but it's publicly archived and all active work is likely here :
https://github.com/dtolnay/cxx

How to call a C++ dynamic library from Rust?

I want to call a C++ dynamic library (*.so) from Rust, but I don't want to build it from Rust. Like this,
cc::Build::new()
    .file("src/foo.cc")
.shared_flag(true)
.compile("libfoo.so");
In some cases, I only need to call several functions, not all the functions. How can I use it?

Before you go further, make sure you have a basic idea of Rust FFI (foreign function interface).
In Rust, it's easy to call C, but hard to call C++.
To call C functions in Rust, you just have to wrap them with extern, do some basic type casting and sometimes unsafe.
To call C++ functions, since Rust does not have built-in knowledge of C++ features, you may have to do a lot of manual translation. For example, here is part of the documentation from Rust-Qt:
Many things are directly translated from C++ to Rust:
Primitive types are mapped to Rust's primitive types (like bool) and types provided by libc crate (like libc::c_int).
Fixed-size numeric types (e.g int8_t or qint8) are mapped to Rust's fixed size types (e.g. i8).
Pointers, references and values are mapped to Rust's respective types.
C++ namespaces are mapped to Rust submodules.
C++ classes and structs are mapped to Rust structs. This also applies to all instantiations of template classes encountered in the
library's API, including template classes of dependencies.
Free functions are mapped to free functions.
Class methods are mapped to structs' implementations.
Destructors are mapped to Drop and CppDeletable implementations.
Function pointer types are mapped to Rust's equivalent representation. Function pointers with references or class values are
not supported.
static_cast and dynamic_cast are available in Rust through corresponding traits.
Names of Rust identifiers are modified according to Rust's naming
conventions.
When direct translation is not possible:
Contents of each include file of the C++ library are placed into a separate submodule.
Method overloading is emulated with wrapping arguments in a tuple and creating a trait describing tuples acceptable by each method.
Methods with default arguments are treated in the same way.
Single inheritance is translated to Deref and DerefMut implementation, allowing to call base class methods on derived
objects. When deref coercions are not enough, static_cast should be
used to convert from derived to base class.
Getter and setter methods are created for each public class field.
Not implemented yet but planned:
Translate C++ typedefs to Rust type aliases.
Implement operator traits for structs based on C++ operator methods (issue). Operators
are currently exposed as regular functions with op_ prefix.
Implement Debug and Display traits for structs if applicable methods exist on C++ side.
Implement iterator traits for collections.
Subclassing API (issue).
Provide access to a class's public variables (issue).
Provide conversion from enums to int and back (used in Qt API).
Support C++ types nested into template types, like Class1<T>::Class2.
Not planned to support:
Advanced template usage, like types with integer template arguments.
Template partial specializations.
Template methods and functions.
My suggestion is to wrap your C++ library as a C library, then call it the official FFI way, or use rust-bindgen to automatically do the wrapping.
If you still want to call C++ in Rust, rustcxx seems like a handy tool.
As to the library linking, it's pretty simple:
Put the library into your system library searching paths like /usr/lib or /usr/local/lib/, make sure it can be found by ldconfig -p.
Or use the environment variable LD_LIBRARY_PATH to specify the path where your library lays when you run cargo from the CLI.

According to Rust's official website, there is no official support for linkage with C++. Instead you can try and use C libraries.
There is also a thread on this issue in their users forum, where users suggest some 3rd party projects aiming to tackle this issue:
bindgen - Auto generation of FFI for Rust
cpp-to-rust - Allows to use C++ libraries from Rust. The main target of this project is Qt.
I didn't use them, so I can't recommend anything particular or share my experience, but good luck :)

I feel it's worth mentioning as of now there is CXX library in crate
https://docs.rs/cxx/latest/cxx/
https://cxx.rs/
This is what I will be using going forward in order to interoperate with a C++ Library from a vendor.
The assigned answer has a link but it's publicly archived and all active work is likely here :
https://github.com/dtolnay/cxx

How does the compiler define the classes in type_traits?

In C++11 and later, the <type_traits> header contains many classes for type checking, such as std::is_empty, std::is_polymorphic, std::is_trivially_constructible and many others.
While we use these classes just like normal classes, I cannot figure out any way to possibly write the definition of these classes. No amount of SFINAE (even with C++14/17 rules) or other method seems to be able to tell if a class is polymorphic, empty, or satisfy other properties. An class that is empty still occupies a positive amount of space as the class must have a unique address.
How then, might compilers define such classes in C++? Or perhaps it is necessary for the compiler to be intrinsically aware of these class names and parse them specially?

Back in the olden days, when people were first fooling around with type traits, they wrote some really nasty template code in attempts to write portable code to detect certain properties. My take on this was that you had to put a drip-pan under your computer to catch the molten metal as the compiler overheated trying to compile this stuff. Steve Adamczyk, of Edison Design Group (provider of industrial-strength compiler frontends), had a more constructive take on the problem: instead of writing all this template code that takes enormous amounts of compiler time and often breaks them, ask me to provide a helper function.
When type traits were first formally introduced (in TR1, 2006), there were several traits that nobody knew how to implement portably. Since TR1 was supposed to be exclusively library additions, these couldn't count on compiler help, so their specifications allowed them to get an answer that was occasionally wrong, but they could be implemented in portable code.
Nowadays, those allowances have been removed; the library has to get the right answer. The compiler help for doing this isn't special knowledge of particular templates; it's a function call that tells you whether a particular class has a particular property. The compiler can recognize the name of the function, and provide an appropriate answer. This provides a lower-level toolkit that the traits templates can use, individually or in combination, to decide whether the class has the trait in question.

Will C++ compiler generate code for each template type?

I have two questions about templates in C++. Let's imagine I have written a simple List and now I want to use it in my program to store pointers to different object types (A*, B* ... ALot*). My colleague says that for each type there will be generated a dedicated piece of code, even though all pointers in fact have the same size.
If this is true, can somebody explain me why? For example in Java generics have the same purpose as templates for pointers in C++. Generics are only used for pre-compile type checking and are stripped down before compilation. And of course the same byte code is used for everything.
Second question is, will dedicated code be also generated for char and short (considering that they both have the same size and there are no specialization).
If this makes any difference, we are talking about embedded applications.
I have found a similar question, but it did not completely answer my question: Do C++ template classes duplicate code for each pointer type used?
Thanks a lot!

I have two questions about templates in C++. Let's imagine I have written a simple List and now I want to use it in my program to store pointers to different object types (A*, B* ... ALot*). My colleague says that for each type there will be generated a dedicated piece of code, even though all pointers in fact have the same size.
Yes, this is equivalent to having both functions written.
Some linkers will detect the identical functions, and eliminate them. Some libraries are aware that their linker doesn't have this feature, and factor out common code into a single implementation, leaving only a casting wrapper around the common code. Ie, a std::vector<T*> specialization may forward all work to a std::vector<void*> then do casting on the way out.
Now, comdat folding is delicate: it is relatively easy to make functions you think are identical, but end up not being the same, so two functions are generated. As a toy example, you could go off and print the typename via typeid(x).name(). Now each version of the function is distinct, and they cannot be eliminated.
In some cases, you might do something like this thinking that it is a run time property that differs, and hence identical code will be created, and the identical functions eliminated -- but a smart C++ compiler might figure out what you did, use the as-if rule and turn it into a compile-time check, and block not-really-identical functions from being treated as identical.
If this is true, can somebody explain me why? For example in Java generics have the same purpose as templates for pointers in C++. Generics are only used for per-compile type checking and are stripped down before compilation. And of course the same byte code is used for everything.
No, they aren't. Generics are roughly equivalent to the C++ technique of type erasure, such as what std::function<void()> does to store any callable object. In C++, type erasure is often done via templates, but not all uses of templates are type erasure!
The things that C++ does with templates that are not in essence type erasure are generally impossible to do with Java generics.
In C++, you can create a type erased container of pointers using templates, but std::vector doesn't do that -- it creates an actual container of pointers. The advantage to this is that all type checking on the std::vector is done at compile time, so there doesn't have to be any run time checks: a safe type-erased std::vector may require run time type checking and the associated overhead involved.
Second question is, will dedicated code be also generated for char and short (considering that they both have the same size and there are no specialization).
They are distinct types. I can write code that will behave differently with a char or short value. As an example:
std::cout << x << "\n";
with x being a short, this print an integer whose value is x -- with x being a char, this prints the character corresponding to x.
Now, almost all template code exists in header files, and is implicitly inline. While inline doesn't mean what most folk think it means, it does mean that the compiler can hoist the code into the calling context easily.
If this makes any difference, we are talking about embedded applications.
What really makes a difference is what your particular compiler and linker is, and what settings and flags they have active.

The answer is maybe. In general, each instantiation of a
template is a unique type, with a unique implementation, and
will result in a totally independent instance of the code.
Merging the instances is possible, but would be considered
"optimization" (under the "as if" rule), and this optimization
isn't wide spread.
With regards to comparisons with Java, there are several points
to keep in mind:
C++ uses value semantics by default. An std::vector, for
example, will actually insert copies. And whether you're
copying a short or a double does make a difference in the
generated code. In Java, short and double will be boxed,
and the generated code will clone a boxed instance in some way;
cloning doesn't require different code, since it calls a virtual
function of Object, but physically copying does.
C++ is far more powerful than Java. In particular, it allows
comparing things like the address of functions, and it requires
that the functions in different instantiations of templates have
different addresses. Usually, this is not an important point,
and I can easily imagine a compiler with an option which tells
it to ignore this point, and to merge instances which are
identical at the binary level. (I think VC++ has something like
this.)
Another issue is that the implementation of a template in C++
must be present in the header file. In Java, of course,
everything must be present, always, so this issue affects all
classes, not just template. This is, of course, one of the
reasons why Java is not appropriate for large applications. But
it means that you don't want any complicated functionality in a
template; doing so loses one of the major advantages of C++,
compared to Java (and many other languages). In fact, it's not
rare, when implementing complicated functionality in templates,
to have the template inherit from a non-template class which
does most of the implementation in terms of void*. While
implementing large blocks of code in terms of void* is never
fun, it does have the advantage of offering the best of both
worlds to the client: the implementation is hidden in compiled
files, invisible in any way, shape or manner to the client.

Creating serializeable unique compile-time identifiers for arbitrary UDT's

I would like a generic way to create unique compile-time identifiers for any C++ user defined types.
for example:
unique_id<my_type>::value == 0 // true
unique_id<other_type>::value == 1 // true
I've managed to implement something like this using preprocessor meta programming, the problem is, serialization is not consistent. For instance if the class template unique_id is instantiated with other_type first, then any serialization in previous revisions of my program will be invalidated.
I've searched for solutions to this problem, and found several ways to implement this with non-consistent serialization if the unique values are compile-time constants. If RTTI or similar methods, like boost::sp_typeinfo are used, then the unique values are obviously not compile-time constants and extra overhead is present. An ad-hoc solution to this problem would be, instantiating all of the unique_id's in a separate header in the correct order, but this causes additional maintenance and boilerplate code, which is not different than using an enum unique_id{my_type, other_type};.
A good solution to this problem would be using user-defined literals, unfortunately, as far as I know, no compiler supports them at this moment. The syntax would be 'my_type'_id; 'other_type'_id; with udl's.
I'm hoping somebody knows a trick that allows implementing serialize-able unique identifiers in C++ with the current standard (C++03/C++0x), I would be happy if it works with the latest stable MSVC and GNU-G++ compilers, although I expect if there is a solution, it's not portable.
I would like to make clear, that using mpl::set or similar constructs like mpl::vector and filtering, does not solve this problem, because the scope of the meta-set/vector is limited and actually causes more problems than just preprocessor meta programming.

A while back I added a build step to one project of mine, which allowed me to write #script_name(args) in a C++ source file and have it automatically replaced with the output of the associated script, for instance ./script_name.pl args or ./script_name.py args.
You may balk at the idea of polluting the language into nonstandard C++, but all you'd have to do is write #sha1(my_type) to get the unique integer hash of the class name, regardless of build order and without the need for explicit instantiation.
This is just one of many possible nonstandard solutions, and I think a fairly clean one at that. There's currently no great way to impose an arbitrary, consistent ordering on your classes without just specifying it explicitly, so I recommend you simply give in and go the explicit instantiation route; there's nothing really wrong with centralising the information, but as you said it's not all that different from an enumeration, which is what I'd actually use in this situation.

Persistence of data is a very interesting problem.
My first question would be: do you really want serialization ? If you are willing to investigate an alternative, then jump to the next section.
If you're still there, I think you have not given the typeid solution all its due.
// static detection
template <typename T>
size_t unique_id()
{
static size_t const id = some_hash(typeid(T)); // or boost::sp_typeinfo
return id;
}
// dynamic detection
template <typename T>
size_t unique_id(T const& t)
{
return some_hash(typeid(t)); // no memoization possible
}
Note: I am using a local static to avoid the order of initialization issue, in case this value is required before main is entered
It's pretty similar to your unique_id<some_type>::value, and even though it's computed at runtime, it's only computed once, and the result (for the static detection) is then memoized for future calls.
Also note that it's fully generic: no need to explicitly write the function for each type.
It may seem silly, but the issue of serialization is that you have a one-to-one mapping between the type and its representation:
you need to version the representation, so as to be able to decode "older" versions
dealing with forward compatibility is pretty hard
dealing with cyclic reference is pretty hard (some framework handle it)
and then there is the issue of moving information from one to another --> deserializing older versions becomes messy and frustrating
For persistent saves, I usually recommend using a dedicated BOM. Think of the saved data as a message to your future self. And I usually go the extra mile and proposes the awesome Google Proto Buffer library:
Backward and Forward compatibility baked-in
Several format outputs -> human readable (for debug) or binary
Several languages can read/write the same messages (C++, Java, Python)

Pretty sure that you will have to implement your own extension to make this happen, I've not seen nor heard of any such construct for compile-time. MSVC offers __COUNTER__ for the preprocessor but I know of no template equivalent.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js