Is it a good idea to expose Vulkan's function pointers globally?

Is it a good idea to expose Vulkan's function pointers globally? - c++

I just loaded a shared library for the first time at runtime. What I am currently doing is that I create an explicit context where all function pointers are loaded.
It would roughly look like this in C++
auto entry = load_vk(..);
auto instance = entry.CreateInstance(...);
VkInstancePointer vk = load_vk_static_fn(instance);
vk.CreateDevice(...);
The problem is that I am not sure about the lifetime of this. I would need to access vk across different threads, so I am currently wrapping it in a shared_ptr<VkInstancePointer>. I also unload the library in the destructor.
The sort of problem that I am having is that I want to make the vulkan api a bit more convenient so that I am able to write
physical_device.create_logical_device(...);
But that would mean that a physical_device needs to contain a shared_ptr<VkInstancePointer>. That means that a lot of stuff will have an additional overhead of an atomic counter increment.
I am wondering if I could just load the vulkan function pointers globally?
load_entry();
auto instance = CreateInstance();
load_instance_fp(instance);
auto device = CreateDevice(..);
I usually never use globals but it seems that it would make sense here.
Do I ever want to unload the Vulkan library at some point?

There are two kinds of function pointers in Vulkan: instance function pointers and device function pointers.
Instance function pointers are retrieved via vkGetInstanceProcAddr. This function can retrieve function pointers for functions that are device-independent. That is, for functions that deal with creating/managing/destroying devices, as opposed to functions that talk directly to a device (ie: any function that takes a VkDevice, VkQueue or VkCommandBuffer).
But it also can retrieve pointers for functions that talk to the device itself . These functions can talk to any Vulkan device, whether it was created before or after the function pointers were retrieved.
By contrast, vkGetDeviceProcAddr gets device function pointers. These function pointers are specific to a device; they cannot be used with a different device from the one they were created with.
So you can create global function pointers which can be used from any thread to talk to any device. But they have to be instance function pointers.
Or you can just let the Vulkan SDK do its job and handle all of this for you.
Do I ever want to unload the Vulkan library at some point?
I don't know of a reason why you would want to. Users generally can't install new versions of a driver-level system like Vulkan without a restart of the machine. And even if you could, your code wouldn't know what to do with its new stuff, since your code was written against a lower version.

Related

How to turn an ANSI C struct into a C++ class, but keep it ANSI C friendly?

A frame read out from external device is stored in shared memory (in a struct), to be used both by main (C++) application, and an ANSI C library.
For reasons a bit too broad to explain here the library must remain pure ANSI C, and must retain access to the structure in its "pure ANSI C" form. But the main application uses the data in a lot of places, and it's a nuisance to do this the "ANSI C" way, treating it as a dumb data container. It would be much nicer if it was a class - if I could add constructors, copy constructor, comparison operators, a neater 'is valid' method (currently checked as absence of a magic number in one of the struct fields), generally a lot of stuff that a C++ class can do, but an ANSI C struct can't. Except if I replace the struct with the class in the shared memory, it will break compatibility with the library.
What's a neat way to achieve this? Create a class that inherits from the struct? Inheritance through composition? A separate class with a set of conversion methods? Something else I didn't think about? Some transparent way to keep the data visible to C unchanged, but enhanced with class features for C++?
Note: both C++ and C operate on the same instance of the structure. Main app reads out the frame out of the device, writes to the structure in shared memory, then calls the library functions to do their magic on the frame (possibly, but not necessarily modifying it; business logic), then performs own operations on it (display, logging, re-broadcasting on other media etc.
I have full control over the C++ code, but C code is mostly out of my control; I can create a local copy of the structure, if that's beneficial, but my copy and the 'business logic instance' should remain in sync, or at least be synced before and after each library function call (operation under my control, but timing dictated by system requirements.)
edit:
some extra details, as requested:
the "business logic" is implemented in the C library, customized C code generated by an external application (for PC) from input from user (graphical interface for drafting the logic; think "block diagram"), varies per device (many users, even more devices). The device requires cross-compiling; only an ANSI C cross-compiler is available in a form that can be easily bundled with the PC application; C++ cross-compiler is available only on the system developer's (my) PCs ; its installation process and license make it impossible to bundle with the (sold) generator app.
the library and the C++ application on the device use shared memory as storage of about all input and output data, for two reasons:
primarily because the volume and variety of that data would make it extremely difficult to provide it as parameters in function calls (over 20 wildly varying external systems the device can cooperate with, each with own communication protocol, each providing inputs and/or accepting outputs which may be utilized in business logic). C++ app handles all communication and converts the data back and forth between the various interfaces and an "easily digestible" data format stored in the shared memory for consumption by the library as required (by the particular instance of business logic).
But there are other applications running on the device - a WWW server, a debug app, etc - which can peer into the shared memory too, to display current state, allow real-time parameter tweaks etc. While such "centralized storage"/"superglobal" may be considered an anti-pattern, considering the wild variety of interactions between the systems (internal, and external, where the C++ serves as a central hub for connecting them) and makes for a much clearer structure than what kind of byzantine web would arise if I tried to connect every data provider with every data consumer directly.
The main app handles synchronization (timing, locking) reads from shared memory for all interfaces where it matters; others can just peer into the shared memory and pick what they need whenever they want (read-only); resulting race condition errors will result in a perfectly acceptable momentary glitch that will be corrected come next 'tick'. All writes are synchronized.
because the C library is the primary, most important consumer and provider for the shared memory, the structure in shared memory must remain C compatible.

Create a class that is derived from the C-struct but make sure that the memory layout stays untouched, i.e. do not use virtual methods (these would add a vtable) or add member variables. In C++11 terms this would be called a standard-layout class. For more information see here:
What are Aggregates and PODs and how/why are they special?
Obeying this rule you can safely cast between the C struct and C++ class and use the appropriate member functions.
Note: Regarding allocating the data structure you need to use the same set of functions for release as for allocation, i.e. if it has been allocated using malloc(), it must be released using free() and if it has been allocated using new it must be released using delete. Therefore, if you want to allocate objects both from within C and C++ code, you are limited to malloc/free since new/delete is not available from within C.

If you somehow subclass this, the base structure would need a vtable. Can you change the struct to include these elements, and then cast it to/from the class/struct. You'd need to use reinterpret_cast<>().
However that is a terrible idea. Please do not do it.
Instead, implement your business logic in a class, that contains the struct. This way you won't need to support marshalling the structure between the two codebases, you can keep one copy of the struct.
However remember to mark the struct as volatile, if it needs to be altered by any form of background thread that the C++ code is not aware of.

simple, move your c struct to c++ PIMPL
// A.h
class A
{
//...
private:
struct impl;
static std::unique_ptr<impl> p_impl;
}
// A.cpp
std::unique_ptr<A::impl> A::p_impl;
struct A::impl
{
// c code here
}

My understanding is that the C++ managing code allocates a block of shared memory, and the C and C++ code coordinate what type of information will be written where. The C++ manager periodically examines locations in memory, knowing what type of data it expects to find. I imagine then that the C++ code has a table of void* pointers to your_c_struct* pointers to examine. In the former case, these can always be converted as necessary via reinterpret_cast<your_c_struct*>(void_ptr). So, I'll assume that the C++ code has a table of pointers to C structs in shared memory. In that case I think a solution is to create a pseudo-RAII class, which can either point to un-owned locations in shared memory or own and allocate/deallocate memory on the heap. This would look something like:
class Wrapper {
public:
Wrapper() : owned(true), data(new your_c_struct{}) {}
Wrapper(your_c_struct* _data) : owned(false), data(_data) {}
~Wrapper() {
if (owned)
delete data;
}
// copy constructors
// overloaded comparison operators
private:
bool owned;
your_c_struct* data;
};
There are two main ways to construct this class: either making a new object on the heap or passing it a pointer to shared memory that it does not own. I've used this technique for the GSL library successfully, where the equivalent of C structs in shared memory are C structs allocated within and returned by GSL numerical algorithms. I went a further step of making the second constructor provide and providing a named constructor Wrapper Wrapper::SoftWrap(your_c_struct* _data).

How can I link to callback functions in Lua such that the callbacks will be updated when the scripts are reloaded?

I'm implementing Lua scripting in my game using LuaBind, and one of the things I'm not clear on is the logistics of reloading the scripts live ingame.
Currently, using the LuaBind C++ class luabind::object, I save references to Lua callbacks directly in the classes that use them. Then I can use luabind::call_function using that object in order to call the Lua code from the C++ code.
I haven't tested this yet, but my assumption is that if I reload the scripts, then all the functions will be redefined, BUT the references to the OLD functions will still exist in the form of the luabind::object held by the C++ code. I would like to be able to swap out the old for the new without manually having to manage this for every script hook in the game.
How best to change this so the process works?
My first thought is to not save a reference to the function directly, but maybe save the function name instead, and grab the function by name every time we want to call it. I'm looking for better ideas!

My first thought is to not save a reference to the function directly, but maybe save the function name instead, and grab the function by name every time we want to call it.
If your classes are calling global functions with known names, then that pretty much solves your problem. No need to grab a reference in advance; it's not going to make a measurable performance difference. I think call_function supports passing the function name as a string anyway, right?
You typically store reference to a function value when the Lua script is registering a callback. In that case, it's much better than storing a name, because it allows the Lua script to register functions which are local, anonymous, ect.
If you really had to grab the value value in advance, as you're doing now (and there's really no reason to do that, but we'll pretend it's necessary), I would add a layer of indirection. You could have a LuaFunctionReference class which encapsulates a global name. During instantiation, it grabs a reference to the function the global contains. These objects could be acquired from a factory which maintains a list of all such references. When you reload a script, you could have the factory/manager/pool/etc. object iterate through the references and have them update themselves, so all the references tucked away in classes throughout the system would be updated.

Should I use integer ID or pointers for my opaque objects?

I'm writing an abstraction layer on top of some graphics API (DirectX9 and DirectX11) and I would like your opinion.
Traditionally I would create a base class for each concept I want to abstract.
So in typical OO fashion I would have for example a class Shader and 2 subclasses DX9Shader and DX11Shader.
I would repeat the process for textures, etc... and when I need to instantiate them I have an abstract factory that will return the appropriate subclass depending on the current graphics API.
Following RAII, the returned pointer would be encapsulated in a std::shared_ptr.
So far so good but in my case there are a few problems with this approach:
I need to come up with a public interface that encapsulate the functionality of both APIs (and other APIs in the future).
The derived class are stored in separate DLLs (one for DX9, one for DX11 etc...) and having a shared_ptr to them in the client is a curse: on exit the graphic dlls are unloaded and if the client still has a shared_ptr to one of the graphics objects boom, crash due to calling code from unloaded DLL.
This prompted me to re-design the way I do things:
I thought I could just return raw pointers to the resources and have the graphics API clean after itself but there's still the issue of dangling pointers on the client side and the interface problem.
I even considered manual reference counting like COM but I thought that would be a step backwards (correct me if I'm wrong, coming from the shared_ptr world, manual reference counting seems primitive).
Then I saw the work of Humus where all his graphics classes are represented by integer IDs (much like what OpenGL does).
Creating a new object only returns its integer ID, and stores the pointer internally; it's all perfectly opaque!
The classes that represent the abstraction (such as DX9Shader etc...) are all hidden behind the device API which is the only interface.
If one wants to set a texture, it's just a matter of calling device->SetTexture(ID) and the rest happens behind the scenes.
The downfall is that the hidden part of the API is bloated, there is a lot of boiler plate code required to make it work and I'm not a fan of a do-it-all class.
Any ideas/thoughts ?

You say that the main problem is that a DLL is unloaded while still having a pointer to its internals. Well... don't do that. You have a class instance, who's members are implemented in that DLL. It is fundamentally an error for that DLL to be unloaded so long as those class instances exist.
You therefore need to be responsible in how you use this abstraction. Just as you need to be responsible with any code you load from a DLL: stuff that comes from the DLL must be cleaned up before you unload the DLL. How you do that is up to you. You could have an internal reference count that gets incremented for every object the DLL returns and only unload the DLL after all referenced objects go away. Or anything, really.
After all, even if you use these opaque numbers or whatever, what happens if you call one of those API functions on that number when the DLL is unloaded? Oops... So it doesn't really buy you any protection. You have to be responsible either way.
The downsides of the number method that you may not be thinking about are:
Reduced ability to know what an object actually is. API calls can fail because you passed a number that isn't really an object. Or worse, what happens if you pass a shader object into a function that takes a texture? Maybe we're talking about a function that takes a shader and a texture, and you accidentally forget the order of the arguments? The rules of C++ wouldn't allow that code to even compile if those were object pointers. But with integers? It's all good; you'd only get runtime errors.
Performance. Every API call will have to look this number up in a hashtable or something to get an actual pointer to work with. If it's a hashtable (ie: an array), then it's probably fairly minor. But it's still an indirection. And since your abstraction seems very low-level, any performance loss at this level can really hurt in performance-critical situations.
Lack of RAII and other scoping mechanisms. Sure, you could write a shared_ptr-esque object that would create and delete them. But you wouldn't have to do that if you were using an actual pointer.
It just doesn't seem worthwhile.

Does it matter? To the user of the object, it is just an opaque handle. its actual implementation type doesn't matter, as long as I can pass the handle to your API functions and have them do stuff with the object.
You can change the implementation of these handles easily, so make it whatever is easier for you now.
Just declare the handle type as a typedef of either a pointer or an integer, and make sure that all client code uses the typedef name, then the client code doesn't depend on the specific type you chose to represent your handles.
Go for the simple solution now, and if/when you run into problems because that was too simple, change it.

Regarding your p. 2: Client is always unloaded before libraries.
Every process has its library dependency tree, with .exe as tree root, user Dll at intermediate levels, and system libraries at low level. Process is loaded from low to high level, tree root (exe) is loaded last. Process is unloaded starting from the root, low-level libraries are unloaded last. This is done to prevent situations you are talking about.
Of course, if you load/unload libraries manually, this order is changed, and you are responsible to keep pointers valid.

Sharing memory between modules

I was wondering how to share some memory between different program modules - lets say, I have a main application (exe), and then some module (dll). They both link to the same static library. This static library will have some manager, that provides various services. What I would like to achieve, is to have this manager shared between all application modules, and to do this transparently during the library initialization.
Between processes I could use shared memory, but I want this to be shared in the current process only.
Could you think of some cross-platform way to do this? Possibly using boost libraries, if they provide some facilities to do this.
Only solution I can think of right now, is to use shared library of the respective OS, that all other modules will link to at runtime, and have the manager saved there.
EDIT:
To clarify what I actually need:
I need to find out, if the shared manager was already created (the answers below already provided some ways to do that)
Get the pointer to the manager, if it exists, or Set the pointer somewhere to the newly created manager object.

I think you're going to need assistance from a shared library to do this in any portable fashion. It doesn't necessarily need to know anything about the objects being shared between modules, it just needs to provide some globally-accessible mapping from a key (probably a string) to a pointer.
However, if you're willing to call OS APIs, this is feasible, and I think you may only need two implementations of the OS-specific part (one for Windows DLLs and GetProcAddress, one for OSes which use dlopen).
As each module loads, it walks the list of previously loaded modules looking for any that export a specially-named function. If it finds one (any, doesn't matter which, because the invariant is that all fully-loaded modules are aware of the common object), it gets the address of the common object from the previously loaded module, then increments the reference count. If it's unable to find any, it allocates new data and initializes the reference count. During module unload, it decrements the reference count and frees the common object if the reference count reached zero.
Of course it's necessary to use the OS allocator for the common object, because although unlikely, it's possible that it is deallocated from a different library from the one which first loaded it. This also implies that the common object cannot contain any virtual functions or any other sort of pointer to segments of the different modules. All its resources must by dynamically allocated using the OS process-wide allocator. This is probably less of a burden on systems where libc++ is a shared library, but you said you're statically linking the CRT.
Functions needed in Win32 would include EnumProcessModules, GetProcAddress, HeapAlloc, and HeapFree, GetProcessHeap and GetCurrentProcess.
Everything considered, I think I would stick to putting the common object in its own shared library, which leverages the loader's data structures to find it. Otherwise you're re-inventing the loader. This will work even when the CRT is statically linked into several modules, but I think you're setting yourself up for ODR violations. Be really particular about keeping the common data POD.

For use from the current process only, you don't need to devise any special function or structure.
You could do it even without any function but it is more safe and cross platform friendly to define set of functions providing access to the shared data. And these functions could be implemented by the common static library.
I think, only concern of this setup is that: "Who will own the data?". There must exist one and only one owner of the shared data.
With these basic idea, we could sketch the API like this:
IsSharedDataExist // check whether of not shared data exist
CreateSharedData // create (possibly dynamically) shared data
DestroySharedData // destroy shared data
... various data access API ...
Or C++ class with the Singleton pattern will be appropriate.
UPDATE
I was confused. Real problem can be defined as "How to implement a Singleton class in a static library that will be linked with multiple dynamic loading library (will be used in the same process) in platform independent way".
I think, basic idea is not much different but make sure the singleton is the really single is the additional problem of this setup.
For this purpose, you could employ Boost.Interprocess.
#include <boost/config.hpp>
#include <boost/interprocess/sync/named_mutex.hpp>
...
boost::interprocess::named_mutex* singleton_check = 0;
// in the Create function of the singleton
try {
singleton_check = new boost::interprocess::named_mutex(boost::interprocess::create_only, "name_of_the_mutex" );
// if no exception throw, this is the first time execution
}
catch (...)
{
}
Freeing the named_mutex is as simple as delete singleton_check.
UPDATE#2
Another suggestion.
I think, we should not place shared data in the common static library. If we can not ensure globally unique data, it is not only tricky platform dependent implementation problems but also waste of memory and global resources.
If you prefer static library implementation you should make two static libraries. One for the server/creator of the shared data, one for users of that shared data. Server library define and provide access to the Singleton. Client library provide various data access method.
This is effectively same as the Singleton implementation without static libraries.

You can use boost::interprocess http://www.boost.org/doc/libs/1_45_0/doc/html/interprocess.html
and on Windows you can create a shared segment in your DLL that will be shared by all processes using #pragma's: http://www.codeproject.com/KB/DLL/data_seg_share.aspx

As per MSDN I see there are only two ways to share data between modules
Using data_seg pragma
Use shared memory.
As someone pointed out Shared Segment works only for two instances of the same dll so we are left with only one choice to use Memory-Mapped Files technique.

Wrapping unmanaged c++ in a managed wrapper

I have an unmanaged C++ library. I would like to expose the functionality for .NET applications. There's one partucular function I am not sure how to handle:
typedef void (free_fn*) (void*);
void put (void *data, free_fn deallocation_function);
The idea is that you pass dynamically allocated buffer to the function and supply a deallocation function. The library will process the data asynchronously and will release the buffer later on when data is no longer needed:
void *p = malloc (100);
... fill in the buffer...
put (p, free);
How can I expose this kind of thing to .NET applications?

Be very careful when you do this. .NET really, really wants to have its objects be pinned on the way into an unmanaged routine and unpinned on the way out. If your unmanaged code holds onto a pointer value, that had been pinned on the way in then there is very real chance that the memory will be moved or garbage collected or both.
This is especially the case with delegates marshalled to function pointers (trust me on this - I found that marshaled delegates were being garbage collected on me - I had people at Microsoft verify that for me). The ultimate solution to this problem is to stash away copies of your delegates in a static table paired with a unique transaction id, then create an unmanaged function that when called looks up the delegate in the table via transaction id then executes it. It's ugly and if I had another choice, I would've used it.
Here's the best way to do this in your case - since your unmanaged code uses a set it and forget it model, then you should make your API chunkier. Create an wrapper in managed C++ that allocates memory via an unmanaged routine, copies your data into it and then passes it on along with a pointer to an unmanaged deallocator.

In general, .NET consumers of your library won't be passing dynamically created arrays to your functions. As far as I know, all containers in .NET are garbage collected.
Regardless, you will need to make a managed wrapper for your unmanaged code. There are many tutorials and articles on this, here is one to start with.
When writing .NET wrappers for unamanged code, I've found that you want to concentrate more on preserving functionality than on making every function accessible in .NET. In your example, it may be better to just have the managed wrapper copy the array into unmanaged memory and perform whatever operations you need to inside the library. This way you don't have to do any pinning of managed memory or Marshalling of managed to unmanaged memory in order to circumvent the .NET runtime's garbage collection. However, how you implement the managed wrapper really depends on what the purpose of that function is.
If you really want to implement this function for function in .NET, you will need to look at the Marshal class in .NET for taking control of managed memory in unmanaged code.
For your callback function, you will first need to create .NET delegates that can be assigned in managed code. You will then need to make an unmanaged free function internal to your library that is called by the unmanaged version of the put function. This unmanaged free function will then be responsible for calling the managed delegate, if the user assigned one.

You definitely don't want to pin the managed buffer, as trying to deallocate it in unmanaged code seems like the shortest route to madness. If you can't rewrite this portion in fully managed code, your best bet is either going to be making a copy of the data in the wrapper, or completely hiding the buffer management from the managed world.
If you had the guts (and the masochistic stamina) you could pin the buffer in the wrapper, then pass in the marshaled delegate of a managed function that unpins the buffer. However, I wouldn't suggest it. Having had to do a couple of managed wrappers has taught me the value of exposing the absolute minimum unmanaged functionality, even if it means you have to rewrite some things in managed code. Crossing that boundary is about as easy as going from East Germany to West Germany used to be, to say nothing of the performance hits.

Most replies suggest that the data should be copied from managed buffer to unmanaged buffer. How exactly would you do that? Is following implementation OK?
void managed_put (byte data_ __gc[], size_t size_)
{
// Pin the data
byte __pin *tmp_data = &data_[0];
// Copy data to the unmanaged buffer.
void *data = malloc (size_);
memcpy (data, (byte*) tmp_data, size_);
// Forward the call
put (data, size_, free);
}

Some of the previous poster's have been using MC++, which is deprecated. C++/CLI is far more elegant of a solution.
The BEST, technique for interop, is implicit interop, not explicit. I dont believe anybody has commented on this yet. However, it gives you the ability to marshal your types from managed<->native where if you make a change to your type definition or structure layout, it will not result in a breaking change (which explicit interop does).
This wikiepedia article documents some of the differences and is a good starting point for further information.
P/Invoke (explicit and implicit)
Also, the site marshal-as.net has some examples and information as to this newer method (again, more ideal as it will not break your code if the a native struct is re-defined).

You'd have to have managed wrappers for the functions themselves (or unmanaged wrappers if you want to pass in managed functions). Or else, treat the unmanaged function pointers as opaque handles in the managed world.

Since you mentioned it was asyncronous, I'd do it this way.
The .Net exposed function only takes the data but doesn't take a delegate. Your code passes the pinned data and a function pointer to a function that will simply unpin the data. This leaves the memory cleanup to the GC, but makes sure the it won't clean it up till the asyncronous part is done.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js