Alternative to boost::shared_ptr in an embedded environment

Alternative to boost::shared_ptr in an embedded environment - c++

I'm using C++ in an embedded linux environment which has GCC version 2.95.
I just can't extract boost::shared_ptr files with bcp, it is just too heavy.
What I'd like would be a simple smart pointer implementation of boost::shared_ptr but without all boost overheads (if it is possible...).
I could come up with my own version reading boost source but I fear missing one or more points, it seems easy to make a faulty smart pointer and I can't afford to have a buggy implementation.
So, does a "simple" implementation or implementation example of boost::shared_ptr (or any reference counting equivalent smart pointer) exists that I could use or that I could take as an inspiration?

if you don't need mixing shared and weak ptr,and don't need coustom deletors, you can just use the quick and dirty my_shared_ptr:
template<class T>
class my_shared_ptr
{
template<class U>
friend class my_shared_ptr;
public:
my_shared_ptr() :p(), c() {}
explicit my_shared_ptr(T* s) :p(s), c(new unsigned(1)) {}
my_shared_ptr(const my_shared_ptr& s) :p(s.p), c(s.c) { if(c) ++*c; }
my_shared_ptr& operator=(const my_shared_ptr& s)
{ if(this!=&s) { clear(); p=s.p; c=s.c; if(c) ++*c; } return *this; }
template<class U>
my_shared_ptr(const my_shared_ptr<U>& s) :p(s.p), c(s.c) { if(c) ++*c; }
~my_shared_ptr() { clear(); }
void clear()
{
if(c)
{
if(*c==1) delete p;
if(!--*c) delete c;
}
c=0; p=0;
}
T* get() const { return (c)? p: 0; }
T* operator->() const { return get(); }
T& operator*() const { return *get(); }
private:
T* p;
unsigned* c;
}
For anyone interested in make_my_shared<X>, it can be trivially implemented as
template<class T, class... U>
auto make_my_shared(U&&... u)
{
return my_shared_ptr<T>(new T{std::forward<U>(u)...});
}
to be called as
auto pt = make_my_shared<T>( ... );

There is also std::tr1::shared_ptr which is just C++11 standard's lifting from boost. You could pick that if it is allowed, or write your own with reference counting.

I'd suggest you can use shared_ptr in isolation. However, if you are looking for simple implementation
with some unit tests
not thread safe
basic support for polymorphic assignment <-- let me know if you're intrested
custom deletors (i.e. array deletors, or custom functors for special resources)
Have a look here: Creating a non-thread safe shared_ptr

Which "boost overheads" are you concerned about, and in what way is shared_ptr "too heavy" for your application? There are no overheads simply from using Boost (at least if you only use header-only libraries, such as the smart pointer library); the only overheads I can think of concerning shared_ptr are:
Thread-safe reference counting; since you're using an ancient compiler, I assume you also have an ancient runtime library and kernel, so this may be very inefficient. If your application is single threaded, then you can disable this by #defining BOOST_DISABLE_THREADS.
Allocation of the reference-counting structure alongside the managed object. You can eliminate the extra allocation by creating the object using make_shared() or allocate_shared().
In many cases, you can eliminate the overhead from speed-critical code by not creating objects or copying shared pointers - pass pointers by reference and only copy them when you actually need to.
If you need thread safety for some pointers, but not others, and (after profiling, and removing all unnecessary allocations and pointer copying) you find that the use of shared pointers is still causing significant overhead, then you could consider using intrusive_ptr, and manage your own reference counting within the object.
There may also be benefit to updating to a modern GNU/Linux version, if that's feasible. Thread synchronisation is much more efficient since the introduction of futexes in Linux 2.6. You may find this helps in other ways too; there have been many improvements over the last decade. A more modern compiler would also provide the standard (TR1 or C++11) shared pointer, so you wouldn't need Boost for that.

You can use shared_ptr without all the Boost overhead: it's a header-only implementation. If you don't use anything other classes, only shared_ptr will be compiled in.
The implementation of shared_ptr is already pretty lean, but if you want to avoid the intermediate reference count block and a (potential) virtual call to deleter function, you can use boost::intrusive_ptr instead, which is more suited for embedded environments: it operates on a reference counter embedded in the object itself, you just have to provide a couple of functions to increment/decrement it. The downside is that you won't be able to use weak_ptr.
I can't comment how well gcc 2.95 would inline/fold template instantiations (it's a very old compiler), more recent versions of gcc handle it rather well, so you are on your own here.

Related

Implementation of the std::optional class

I need to implement a quick solution for optional values. I don't want to drag in any third party libraries.
How are the optional classes implemented in general? Does an optional object still default-construct the underlying object when it's in the 'null-state'?

How are the optional classes implemented in general?
Typically, a boolean flag to indicate whether or not it's empty, and a suitably sized and aligned byte array to store the value.
Does an optional object still default-construct the underlying object when it's in the 'null-state'?
No; that would impose an unnecessary requirement on the stored type, as well as causing potential unwanted side-effects. The stored object would be created with placement-new when the optional becomes non-empty, and destroyed with a destructor call when it becomes empty.
For a quick-and-dirty implementation, if you don't need all the flexibility of the Boost or proposed standard versions, you could simply store a default-constructed object.
I don't want to drag in any third party libraries.
I would reconsider why you don't feel you want that. The Boost implementation is header-only, well tested, and should be directly replaceable by the standard version if and when that arrives. I'd certainly trust it more than something I cobbled together myself.

First of all I highly recommend you to take a look at Boost (specifically at Boost.Optional) - it is almost standard practice to use Boost and it would save you reinventing the wheel.
If for some reason you are reluctant to use Boost.Optional, there are a bunch of similar header-only libraries, for example https://github.com/akrzemi1/Optional

In c++14 or earlier you can use a null-checked T* or just a std::pair<T, bool>. The problem with the latter is if your default construction of T is expensive, that may be wasteful.
In c++17 or later you can still use T* but you can also use std::optional<T>. Here the latter only constructs a T if it is valid.
It's noteworthy that the std::optional is a good option in only a few cases: https://topanswers.xyz/cplusplus?q=923#a1085

Here is a start to mimicking std::optional. Not the trickiest implementation and has a lot more that should be added.
template <typename T>
struct optional {
private:
bool _has_value;
T _value;
public:
optional() : _has_value{false}, _value{} {}
optional(T v) : _has_value{true}, _value{v} {}
bool has_value() const {return _has_value;}
T value() const {
if (_has_value) return _value;
throw std::bad_optional_access();
}
T value_or(T def) const {
return _has_value ? _value : def;
}
optional<T>& operator=(T v) {
_has_value = true;
_value = v;
return *this;
}
void reset() {_has_value = false;}
};

std::optional from Scratch introduces how to implement a optional class

boost::shared_ptr drop-in replacement

Recently i started to work on one legacy project and trying to fix segfaults (double delete). Many of them is happening on boost::shared_ptr destructor or operator=(on objects that contain a shared_ptr). The code contains is massive usage of shared_ptr-s, including copying, reset()-ing, assigning etc. According to boost docs we have not valid usage - its not safe to destruct/copy/reset same shared_ptr in many threads.
Locking each time seems impossible, so im searching for drop-in replacement for boost::shared_ptr. So the question is: if i replace all boost::shared_ptr with std::shared_ptr or std::tr1::shared_ptr will the solve this issue? Seems tr1 is safer version but its not clear for me. Second question - is c++0x version any better than tr1 ? (note we have gcc 4.4.6 and cannot upgrade it)
Accoring to gcc docs, c++11 std::shared_ptr should fix that, but im not sure about gcc4.4 version...
UPD: Just maked experiment and now i know all 3 implementations do segfault on this code(gcc 4.4).. seems i should make custom class or maybe other workaround...
#include <iostream>
#include <boost/thread.hpp>
#include <boost/shared_ptr.hpp>
typedef boost::shared_ptr<int> ptrtype;
ptrtype p(new int);
void test() {
for(long i=0; i<1000000; ++i) {
ptrtype p1 = p;
p = ptrtype();
p.reset( new int );
}
}
int main() {
boost::thread_group tg;
for(int i=0; i<100; ++i) tg.add_thread( new boost::thread(test) );
tg.join_all();
std::cout << "Normal exit\n";
return 0;
}

Step 1: Build a class like this, and replace usage of boost::shared_ptr<T> with it.
template<typename T>
struct trivial_ptr {
T* t;
template<typename U>
void reset( U* p ) {t=p;}
void reset( T* p = NULL ) { t=p; }
template<typename U>
trivial_ptr<T>& operator=(trivial_shared_ptr<U>const& o) {
t = o.t;
return *t;
}
explicit trivial_ptr( T* p ):t(p) {}
...
};
this class is not intended to run, but just to compile with the correct interface. Once you have compiled, you can ensure you know what parts of the boost::shared_ptr interface you are using. (Are you using custom deleters? etc -- the problem could be harder, or easier, and the above can help test it)
Once you are there, you can work out how hard it will be to write up a shared_ptr<T> that handles multiple threads accessing the same variable at the same time.
Now, this is extremely messy. If one thread resets a given shared_ptr while another thread reads from it, the pointer read from may be completely invalid by the time the reading thread has access to it. In effect, you need to guard all access to the underlying pointer in a mutex, which is utterly impractical.
On the other hand, if what you have is multiple readers, and never a reader and a writer, you are in better shape -- in theory, you could fix the problem with the use of appropriate locks on the reference count code.
However, what you actually describe seems to involve multiple threads both reading and writing to the same variable. And that is fundamentally broken in a way that mere thread safety on the shared_ptr variable is not going to fix.

The problem you appear to have is attempting to modify the same instance of a variable in two separate threads (AKA a data race). shared_ptr is no more protected against this than int's are. Your reference to the gcc docs says the same thing ("same level of thread safety as built-in types"). Attempting to modify the same instance of a shared_ptr in two different threads requires some sort of synchronization to prevent the data race. Attempting to modify two different instances of a shared_ptr that are pointing to the same object is OK (no data race, or shared_ptr must implement whatever is necessary to prevent a data race internally). Attempting to modify the object that they point to is also a data race.

std::shared_ptr may use atomic ints if the compiler/architecture/implementation supports/use it. But i wouldn't bet on it, and doing so will make your code less portable. But it may work as a temporary workaround (e.g. to have a running program so you understand what the code is supposed to do).
Writing your own shared_ptr wrapper might be an option, but your code will still need to be audited for deadlocks.

Smart pointer that does transfer of ownership and is not shared when C++11 is not an option

Currently I am solving my problem with boost::shared_ptr but the semantics is not quite right, since I am "transplanting" members from one object to another. I was looking through this list but it didn't yield too much. Same goes for my brief google searches.
Essentially I am looking for a unique_ptr implementation that works with my gcc4.2 (Hence the restriction to not use C++11)

You could stick with std::auto_ptr until Boost implements unique_ptr on top of the new Boost Move library (C++03 compatible).
See this mailing list traffic d.d. November 10th 2011: http://boost.2283326.n4.nabble.com/smart-ptr-Inclusion-of-unique-ptr-td4021667.html
Edit And Boost Interprocess has a uniqe_ptr<> class template floating around:
http://www.boost.org/doc/libs/1_48_0/doc/html/boost/interprocess/unique_ptr.html

Depending on what exactly you want, you might consider using boost::scoped_ptr. It is very similar to std::unique_ptr, but it cannot be moved (because C++03 doesn't know about move-semantics). It can, however be swapped. So, when you want to transfer ownership, just do this:
boost::scoped_ptr<T> dummy;
dummy.swap(my_ptr);
// now you have basically transferred ownership from my_ptr to dummy

Use std::auto_ptr<..>, it has exactly what you need.

llvm::OwningPtr - it has take method to take ownership.

boost::shared_ptr with custom deleter.
This is something I used in unit test to simulate Corba _var class behaviour.
struct CondDel {
bool doDel;
CondDel() : doDel(true) {
}
template<typename T>
void operator()(T* p) {
if (doDel)
delete p;
}
};
class MyCorbaObj_var : public boost::shared_ptr<_objref_MyCorbaObj> {
public:
MyCorbaObj_var(_objref_MyCorbaObj* p) : boost::shared_ptr<_objref_MyCorbaObj>(p, CondDel()) {
}
_objref_MyCorbaObj* _retn() {
CondDel* cd = (CondDel*)_internal_get_deleter(typeid(CondDel));
cd->doDel = false;
return get();
}
};

If you want copying objects of your class to make new copies of the pointed-to items, then no kind of pointer (auto_ptr, shared_ptr or naked pointer) will do what you want without some extra work on your part. You will need to manage the copying yourself. Be sure you implement a copy constructor and assignment operator so that they clone the pointed-to items and store the new pointers in the destination object. If you do this, auto_ptr and shared_ptr will both work just fine for you.

returning std::string/std::list from dll

Short question.
I just got a dll I'm supposed to interface with.
Dll uses crt from msvcr90D.dll (notice D), and returns std::strings, std::lists, and boost::shared_ptr. Operator new/delete is not overloaded anywhere.
I assume crt mixup (msvcr90.dll in release build, or if one of components is rebuilt with newer crt, etc) is bound to cause problems eventually, and dll should be rewritten to avoid returning anything that could possibly call new/delete (i.e. anything that could call delete in my code on a block of memory that was allocated (possibly with different crt) in dll).
Am I right or not?

The main thing to keep in mind is that dlls contain code and not memory. Memory allocated belongs to the process(1). When you instantiate an object in your process, you invoke the constructor code. During that object's lifetime you will invoke other pieces of code(methods) to work on that object's memory. Then when the object is going away the destructor code is invoked.
STL Templates are not explicitly exported from the dll. The code is statically linked into each dll. So when std::string s is created in a.dll and passed to b.dll, each dll will have two different instances of the string::copy method. copy called in a.dll invokes a.dll's copy method... If we are working with s in b.dll and call copy, the copy method in b.dll will be invoked.
This is why in Simon's answer he says:
Bad things will happen unless you can
always guarantee that your entire set
of binaries is all built with the same
toolchain.
because if for some reason, string s's copy differs between a.dll and b.dll, weird things will happen. Even worse if string itself is different between a.dll and b.dll, and the destructor in one knows to clean extra memory that the other ignores... you can have difficult to track down memory leaks. Maybe even worse... a.dll might have been built against a completely different version of the STL (ie STLPort) while b.dll is built using Microsoft's STL implementation.
So what should you do? Where we work, we have strict control over the toolchain and build settings for each dll. So when we develop internal dll's, we freely transfer STL templates around. We still have problems that on rare occasion crop up because someone didn't correctly setup their project. However we find the convenience of the STL worth the occasional problem that crops up.
For exposing dlls to 3rd parties, that's another story entirely. Unless you want to strictly require specific build settings from clients, you'll want to avoid exporting STL templates. I don't recommend strictly enforcing your clients to have specific build settings... they may have another 3rd party tool that expects you to use completely opposite build settings.
(1) Yes I know static and locals are instantiated/deleted on dll load/unload.

I have this exact problem in a project I'm working on - STL classes are transmitted to and from DLLs a lot. The problem isn't just the different memory heaps - it's actually that the STL classes have no binary standard (ABI). For example, in debug builds, some STL implementations add extra debugging information to the STL classes, such that sizeof(std::vector<T>) (release build) != sizeof(std::vector<T>) (debug build). Ouch! There's no hope you can rely on binary compatibility of these classes. Besides, if your DLL was compiled in a different compiler with some other STL implementation that used other algorithms, you might have different binary format in release builds, too.
The way I've solved this problem is by using a template class called pod<T> (POD stands for Plain Old Data, like chars and ints, which usually transfer fine between DLLs). The job of this class is to package its template parameter in to a consistent binary format, and then unpackage it at the other end. For example, instead of a function in a DLL returning a std::vector<int>, you return a pod<std::vector<int>>. There's a template specialization for pod<std::vector<T>>, which mallocs a memory buffer and copies the elements. It also provides operator std::vector<T>(), so that the return value can transparently be stored back in to a std::vector, by constructing a new vector, copying its stored elements in to it, and returning it. Because it always uses the same binary format, it can be safely compiled in to separate binaries and remain binary compatible. An alternative name for pod could be make_binary_compatible.
Here's the pod class definition:
// All members are protected, because the class *must* be specialization
// for each type
template<typename T>
class pod {
protected:
pod();
pod(const T& value);
pod(const pod& copy); // no copy ctor in any pod
pod& operator=(const pod& assign);
T get() const;
operator T() const;
~pod();
};
Here's the partial specialization for pod<vector<T>> - note, partial specialization is used so this class works for any type of T. Also note, it actually is storing a memory buffer of pod<T> rather than just T - if the vector contained another STL type like std::string, we'd want that to be binary compatible too!
// Transmit vector as POD buffer
template<typename T>
class pod<std::vector<T> > {
protected:
pod(const pod<std::vector<T> >& copy); // no copy ctor
// For storing vector as plain old data buffer
typename std::vector<T>::size_type size;
pod<T>* elements;
void release()
{
if (elements) {
// Destruct every element, in case contained other cr::pod<T>s
pod<T>* ptr = elements;
pod<T>* end = elements + size;
for ( ; ptr != end; ++ptr)
ptr->~pod<T>();
// Deallocate memory
pod_free(elements);
elements = NULL;
}
}
void set_from(const std::vector<T>& value)
{
// Allocate buffer with room for pods of T
size = value.size();
if (size > 0) {
elements = reinterpret_cast<pod<T>*>(pod_malloc(sizeof(pod<T>) * size));
if (elements == NULL)
throw std::bad_alloc("out of memory");
}
else
elements = NULL;
// Placement new pods in to the buffer
pod<T>* ptr = elements;
pod<T>* end = elements + size;
std::vector<T>::const_iterator iter = value.begin();
for ( ; ptr != end; )
new (ptr++) pod<T>(*iter++);
}
public:
pod() : size(0), elements(NULL) {}
// Construct from vector<T>
pod(const std::vector<T>& value)
{
set_from(value);
}
pod<std::vector<T> >& operator=(const std::vector<T>& value)
{
release();
set_from(value);
return *this;
}
std::vector<T> get() const
{
std::vector<T> result;
result.reserve(size);
// Copy out the pods, using their operator T() to call get()
std::copy(elements, elements + size, std::back_inserter(result));
return result;
}
operator std::vector<T>() const
{
return get();
}
~pod()
{
release();
}
};
Note the memory allocation functions used are pod_malloc and pod_free - these are simply malloc and free, but using the same function between all DLLs. In my case, all DLLs use the malloc and free from the host EXE, so they are all using the same heap, which solves the heap memory issue. (Exactly how you figure this out is down to you.)
Also note you need specializations for pod<T*>, pod<const T*>, and pod for all the basic types (pod<int>, pod<short> etc), so that they can be stored in a "pod vector" and other pod containers. These should be straightforward enough to write if you understand the above example.
This method does mean copying the entire object. You can, however, pass references to pod types, since there is an operator= which is safe between binaries. There's no real pass-by-reference, though, since the only way to change a pod type is to copy it out back to its original type, change it, then repackage as a pod. Also, the copies it creates mean it's not necessarily the fastest way, but it works.
However, you can also pod-specialize your own types, which means you can effectively return complex types like std::map<MyClass, std::vector<std::string>> providing there's a specialization for pod<MyClass> and partial specializations for std::map<K, V>, std::vector<T> and std::basic_string<T> (which you only need to write once).
The end result usage looks like this. A common interface is defined:
class ICommonInterface {
public:
virtual pod<std::vector<std::string>> GetListOfStrings() const = 0;
};
A DLL might implement it as such:
pod<std::vector<std::string>> MyDllImplementation::GetListOfStrings() const
{
std::vector<std::string> ret;
// ...
// pod can construct itself from its template parameter
// so this works without any mention of pod
return ret;
}
And the caller, a separate binary, can call it as such:
ICommonInterface* pCommonInterface = ...
// pod has an operator T(), so this works again without any mention of pod
std::vector<std::string> list_of_strings = pCommonInterface->GetListOfStrings();
So once it's set up, you can use it almost as if the pod class wasn't there.

I'm not sure about "anything that could call new/delete" - this can be managed by careful use of shared pointer equivalents with appropriate allocators/deleter functions.
However in general, I wouldn't pass templates across DLL boundaries - the implementation of the template class ends up in both sides of the interface which means you can both be using a different implementation. Bad things will happen unless you can always guarantee that your entire set of binaries is all built with the same toolchain.
When I need this sort of functionality I often use a virtual interface class across the boundary. You can then provide wrappers for std::string, list etc. that allow you to safely use them via the interface. You can then control allocation etc. using your implementation, or using a shared_ptr.
Having said all this, the one thing I do use in my DLL interfaces is shared_ptr, as it's too useful not to. I haven't yet had any problems, but everything is built with the same toolchain. I'm waiting for this to bite me, as no doubt it will. See this previous question: Using shared_ptr in dll-interfaces

For std::string you can return using c_str. In the case of more complicated stuff, an option can be something like
class ContainerValueProcessor
{
public:
virtual void operator()(const trivial_type& value)=0;
};
Then (assuming you want to use std::list), you can use an interface
class List
{
public:
virtual void processItems(ContainerValueProcessor&& proc)=0;
};
Notice that List can now be implemented by any container.

CUDA: Wrapping device memory allocation in C++

I'm starting to use CUDA at the moment and have to admit that I'm a bit disappointed with the C API. I understand the reasons for choosing C but had the language been based on C++ instead, several aspects would have been a lot simpler, e.g. device memory allocation (via cudaMalloc).
My plan was to do this myself, using overloaded operator new with placement new and RAII (two alternatives). I'm wondering if there are any caveats that I haven't noticed so far. The code seems to work but I'm still wondering about potential memory leaks.
The usage of the RAII code would be as follows:
CudaArray<float> device_data(SIZE);
// Use `device_data` as if it were a raw pointer.
Perhaps a class is overkill in this context (especially since you'd still have to use cudaMemcpy, the class only encapsulating RAII) so the other approach would be placement new:
float* device_data = new (cudaDevice) float[SIZE];
// Use `device_data` …
operator delete [](device_data, cudaDevice);
Here, cudaDevice merely acts as a tag to trigger the overload. However, since in normal placement new this would indicate the placement, I find the syntax oddly consistent and perhaps even preferable to using a class.
I'd appreciate criticism of every kind. Does somebody perhaps know if something in this direction is planned for the next version of CUDA (which, as I've heard, will improve its C++ support, whatever they mean by that).
So, my question is actually threefold:
Is my placement new overload semantically correct? Does it leak memory?
Does anybody have information about future CUDA developments that go in this general direction (let's face it: C interfaces in C++ s*ck)?
How can I take this further in a consistent manner (there are other APIs to consider, e.g. there's not only device memory but also a constant memory store and texture memory)?
// Singleton tag for CUDA device memory placement.
struct CudaDevice {
static CudaDevice const& get() { return instance; }
private:
static CudaDevice const instance;
CudaDevice() { }
CudaDevice(CudaDevice const&);
CudaDevice& operator =(CudaDevice const&);
} const& cudaDevice = CudaDevice::get();
CudaDevice const CudaDevice::instance;
inline void* operator new [](std::size_t nbytes, CudaDevice const&) {
void* ret;
cudaMalloc(&ret, nbytes);
return ret;
}
inline void operator delete [](void* p, CudaDevice const&) throw() {
cudaFree(p);
}
template <typename T>
class CudaArray {
public:
explicit
CudaArray(std::size_t size) : size(size), data(new (cudaDevice) T[size]) { }
operator T* () { return data; }
~CudaArray() {
operator delete [](data, cudaDevice);
}
private:
std::size_t const size;
T* const data;
CudaArray(CudaArray const&);
CudaArray& operator =(CudaArray const&);
};
About the singleton employed here: Yes, I'm aware of its drawbacks. However, these aren't relevant in this context. All I needed here was a small type tag that wasn't copyable. Everything else (i.e. multithreading considerations, time of initialization) don't apply.

In the meantime there were some further developments (not so much in terms of the CUDA API, but at least in terms of projects attempting an STL-like approach to CUDA data management).
Most notably there is a project from NVIDIA research: thrust

I would go with the placement new approach. Then I would define a class that conforms to the std::allocator<> interface. In theory, you could pass this class as a template parameter into std::vector<> and std::map<> and so forth.
Beware, I have heard that doing such things is fraught with difficulty, but at least you will learn a lot more about the STL this way. And you do not need to re-invent your containers and algorithms.

Does anybody have information about future CUDA developments that go in this general direction (let's face it: C interfaces in C++ s*ck)?
Yes, I've done something like that:
https://github.com/eyalroz/cuda-api-wrappers/
nVIDIA's Runtime API for CUDA is intended for use both in C and C++ code. As such, it uses a C-style API, the lower common denominator (with a few notable exceptions of templated function overloads).
This library of wrappers around the Runtime API is intended to allow us to embrace many of the features of C++ (including some C++11) for using the runtime API - but without reducing expressivity or increasing the level of abstraction (as in, e.g., the Thrust library). Using cuda-api-wrappers, you still have your devices, streams, events and so on - but they will be more convenient to work with in more C++-idiomatic ways.

There are several projects that attempt something similar, for example CUDPP.
In the meantime, however, I've implemented my own allocator and it works well and was straightforward (> 95% boilerplate code).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js