How should I allocate/populate/update memory on GPU for different type of scene objects? - directx-12

I'm trying to write my first DX12 app. I have no previous experience in DX11. I would like to display some rigid and some soft objects. Without textures for now. So I need to place into GPU some vertex/index buffers which I will never change later and some which I will change. And the scene per se isn't static, so some new objects can appear and some can vanish.
How should I allocate/populate/update memory on GPU for it? I would like to see high level overview easy to read and understand, not real code. Hope the question isn't too broad.

You said you are new to DirectX, i will strongly recommend you to stay away from DX12 and stick with DX11. DX12 is only useful for people that are already Expert ( with a big E ) and project that has to push very far or have edge cases for a feature in DX12 not possible in DX11.
But anyway, on DX12, as an example to initialize a buffer, you have to create instances of ID3D12Resource. You will need two, one in the an upload heap and one in the default heap. You fill the first one on the CPU using Map. Then you need to use a command list to copy to the second one. Of course, you have to manage the resource state of your resource with barriers ( copy destination, shader resource, ... ). You need then to execute the command list on the command queue. You also need to add a fence to listen the gpu for completion before you can destroy the resource in the upload heap.
On DX11, you call ID3D11Device::CreateBuffer, by providing the description struct with a SRV binding flag and the pointer to the cpu data you want to put in it… Done
It is slightly more complex for texture as you deal with memory layout. So, as i state above, you should focus on DX11, it is not degrading at all, both have their roles.

Related

Dealing with the catch 22 of object lifetimes in vulkan's device, surface, and swapchain in C++?

Background:
In order to even display to the screen you need to enable a "KHR" (khronos group extension) extension for presentation surfaces.
A surface, as far as I understand, is an abstraction of the windows/places images are displayed returned by your window software.
In vulkan you have a VkSurface (returned by your window software, ie GLFW), which has certain properties
These properties are needed in order to know if a Device is compatible with it. In other words, before a VkDevice is created (the actual logical view of the GPU which you can actually use to submit commands to), it needs to know about the Surface if you are going to use it, specifically in order to create a device with presentation queues that support that surface with the properties it has.
Once the device is created, you can create the swapchain, which is basically a series of buffers/attachments you actually use to render to.
Swapchain's however have a 1:1 relationship with surfaces. There can only ever be a single swapchain per surface at max.
Problem:
This is where I start running into issues. In my code-base, I codify this relationship in a member variable. A surface has a swapchain, which guarantees that you as the programmer can't accidentally create multiple swapchains per surface if you use my wrapper.
But, if we use this abstraction the following happens:
my::Surface surface = window.create_surface(...); //VkSurface wrapper
auto queue_family = physical_device.find_queue_family_that_matches(surface,...);
auto queue_create_list = {{queue_family, priority},...};
my::Device device = physical_device.create_device(...,queue_create_list,...);
my::swapchain_builder.swapchain_builder(device);
swapchain_builder.builder_pattern_x(...).builder_pattern_x(...)....;
surface.create_swapchain(swapchain_builder);
...
render loop{
}
...
//end of program
return 0;
//ERROR! device no longer exists to destroy swapchain!
}
Because the surface is created before the device, and because the swapchain is a member of the surface, on destruction the device is destroyed before the swapchain.
The "solution" I came up with in the mean time was this:
my::Device device; //device default constructible, but becomes a VK_NULL_HANDLE underneath
my::Surface surface = ...;
...
device = physical_device.create_device(...,queue_create_list,...);
...
surface.create_swapchain(swapchain_builder);
And this certainly works. The surface is destroyed before the device is, and thus so is the swapchain. But it leaves a bad taste in my mouth.
The whole reason I made swapchain a member was to eliminate bugs caused by multiple swapchains being created, my eliminating the option for the bug to exist in the first place, and remove the need for the user to think about the Vulkan Spec by encoding that requirement into my wrapper itself.
But Now the user has to remember to default initialize the device first... or they will get an esoteric error (not as good as the one I show here) unless they use validation layers.
Question:
Is there some way to encode this object relationship at compile time with out runtime declaration order issues?, is there maybe a better way to codify a 1:1 relationship in this scenario, such that the surface object could exist on it's own and RAII order would handle this?
Swapchain's however have a 1:1 relationship with surfaces. There can only ever be a single swapchain per surface at max.
That is not true. From the standard:
A native window cannot be associated with more than one non-retired swapchain at a time.
You can create multiple swapchains for a surface. However, when you create a new one, you have to provide the old one, and the old one becomes "retired". Images you have previously acquired from the retired swapchain can still be presented, but you cannot acquire images more from the swapchain.
This moves nicely into the next point: the user needs to be able to recreate the swapchain for a surface.
Swapchains can become invalid, perhaps due to user rescaling of a window or other things. When this happens, the user needs to recreate them. Whether you retire the old one or not, you're going to have to call the function to create one.
So if you want your surface class to store a swapchain, your API needs a way for the user to create a swapchain.
In short, your goal is wrong; users need the function you're trying to get rid of.

OpenSceneGraph memory usage when resetting scene

I have spent a great deal of time trying to figure out OSG's memory management.
I have a scene graph with several children (actually a LOD based on an octree).
However, when I need to reset my scene (I just want to wipe ALL nodes from de scene and also wipe the memory), I use
// Clear main osg::Group root node
m_rootNode->removeChildren(0, m_rootNode->getNumChildren());
m_rootNode->dirtyBound();
// Clear Main view scene data from osg::Viewer
m_viewer->setSceneData(nullptr);
BEFORE I do this, I check all my nodes with a NodeVisitor pattern, and found out that ALL my nodes have reference count of 1, i.e, after clearing them from the scene, I expect my memory to be freed. However, this does not happen: my scene is actually reset, all the nodes disappear from the viewer, but the memory remains occupied.
Nonetheless, when I load another scene to my viewer, the memory is rewritten somehow (i.e., the memory usage does not increase, hence there is no memory leak, but used memory is always the same)
I can't have this behaviour, as I need to closely control memory usage. How can I do this?
Looks like OSG keeps cached instances of your data, either as CPU-side or GPU-side objects.
You could have a look at osgDB's options to disable caching in first place (CACHE_NONE, CACHE_ALL & ~CACHE_ARCHIVES), but this can actually increase your memory consumption as data may not be re-used and re-loaded multiple times.
You could instruct osg::Texture to free the CPU-side texture data after it was uploaded to OpenGL - in case you don't need it any more. This can be done conveniently via the osgUtil::Optimizer::TextureVisitor which you would want to set up to change the AutoUnref for each texture to true. I think, running osgUtil::Optimizer with the OPTIMIZE_TEXTURE_SETTINGS achieves the same effect.
Then, after closing down your scene, as you did in your Question's code, you could explicitly instruct OSG's database pager to wipe its caches:
for( osgViewer::View v in AllYourViews )
{
v->getDatabasePager()->cancel();
v->getDatabasePager()->clear();
}
To finally get rid of all pre-allocated GPU-side objects and their CPU-side representations, you would need to destroy your views and GLContext's.

DirectX vertex buffer Default vs Default + Staging

I was searching for the difference between these two in terms of GPU reading speed and occasionally CPU writing(less then once per frame or even only once). I don't want to use D3D11_USAGE_DYNAMIC cause data will not be updated >= once per frame.
Is there a significant performance increase with Default + Staging combo over the Default buffer?
The best performance advice for Direct3D 11 here is actually the same as Direct3D 10.0. Start by reviewing this talk Windows to Reality: Getting the Most out of Direct3D 10 Graphics in Your Games from Gamefest 2007.
To your question, any updates to a resource (texture or buffer) will have some potential performance impact due, but for 'occasional' updates, the best option is to use a STAGING resource and then CopyResource to a DEFAULT resource for actual rendering.
DYNAMIC for textures should be reserved for very frequent updates (like say video texture playback), and of course for vertex buffers when doing dynamic draw submission. Constant buffers are really intended to be DYNAMIC or use UpdateSubresource which really depends on your update pattern (a topic covered in the talk above).
Whenever possible, creating resources as IMMUTABLE and with pInitialData is the best option with DirectX 11 as it potentially allows the driver some opportunity for multi-threaded resource creation which is more efficient.
The main thing to be aware of with this pattern is that STAGING resources can result in virtual-memory fragmentation which can be a problem for 32-bit (X86) apps, so you should try to use them rather than creating a lot of them or destroying them and recreating them. See the talk "Why Your Windows Game Won't Run In 2,147,352,576 Bytes" which is attached to this blog post.
I'd suggest your initial version would be to try to use DEAFAULT+UpdateSubresource and then compare it with a DEAULT+STAGING+CopyResource solution because it really heavily depends on your content and code.

DirectX 9 Device. Creating vertex and index buffers while rendering.(2 threads)

I've been looking for a answer to this question for some time. Anyone know how to do it?
I've got some ideas, can you tell me if they are valid and which is the best one to use(if there are actually suitable solutions).
Create a single directx9 device. Make a copy for the different threads. Render the loading screen(with already loaded buffers) while loading the new level assets and creating their Vertex and index buffers.
Create 2 different directx9 devices. One for each thread. One device is responsible for rendering only(and is attached to the window) and the other has no rendering surface and is taking care of making and filling the buffers.
Create a device with a thread safety flag(I think there is such thing, but it may not be called this way) and do the same as in 1.
Thanks!
If you simply want to load a level, then you don't really need separate thread for that. You could repaint scene while loading resources, for example. I'd advise to avoid multithreading unless you can't live without it.
If you still want multithreading, pass D3DCREATE_MULTITHREADED into IDirect3D9::CreateDevice. Note that DirectX SDK explicitly warns that using this flag may degrade performance.
Creating single device is preferred solution, i.e. I"d advise to use #1 .
It Is possible to share resources between several devices, but this functionality is available only on windows vista. Because people still use WinXP today, if you use something like that, your users will hate you.

What is the most efficient way to manage a large set of lines in OpenGL?

I am working on a simple CAD program which uses OpenGL to handle on-screen rendering. Every shape drawn on the screen is constructed entirely out of simple line segments, so even a simple drawing ends up processing thousands of individual lines.
What is the best way to communicate changes in this collection of lines between my application and OpenGL? Is there a way to update only a certain subset of the lines in the OpenGL buffers?
I'm looking for a conceptual answer here. No need to get into the actual source code, just some recommendations on data structure and communication.
You can use a simple approach such as using a display list (glNewList/glEndList)
The other option, which is slightly more complicated, is to use Vertex Buffer Objects (VBOs - GL_ARB_vertex_buffer_object). They have the advantage that they can be changed dynamically whereas a display list can not.
These basically batch all your data/transformations up and them execute on the GPU (assuming you are using hardware acceleration) resulting in higher performance.
Vertex Buffer Objects are probably what you want. Once you load the original data set in, you can make modifications to existing chunks with glBufferSubData().
If you add extra line segments and overflow the size of your buffer, you'll of course have to make a new buffer, but this is no different than having to allocate a new, larger memory chunk in C when something grows.
EDIT: A couple of notes on display lists, and why not to use them:
In OpenGL 3.0, display lists are deprecated, so using them isn't forward-compatible past 3.0 (2.1 implementations will be around for a while, of course, so depending on your target audience this might not be a problem)
Whenever you change anything, you have to rebuild the entire display list, which defeats the entire purpose of display lists if things are changed often.
Not sure if you're already doing this, but it's worth mentioning you should try to use GL_LINE_STRIP instead of individual GL_LINES if possible to reduce the amount of vertex data being sent to the card.
My suggestion is to try using a scene graph, some kind of hierarchical data structure for the lines/curves. If you have huge models, performance will be affected if you have plain list of lines. With a graph/tree structure you can check easily which items are visible and which are not by using bounding volumes. Also with a scenegraph you can apply transformation easily and reuse geometries.