Directx 11 Memory Management - c++

I've been studying Directx 11 for a while now, but I'm still confused on how Directx 11 manages memory. For example, if I create a vertex buffer using ID3D11Device::CreateBuffer, where is the new buffer stored? I know it returns a pointer to the buffer, so that means it must be stored on CPU RAM right? However, I thought that this would make ID3D11DeviceContext::IASetVertexBuffers a very slow process, because it would have to copy the buffer from CPU RAM to GPU RAM. But if all of the buffers created with ID3D11Device:CreateBuffer were stored on GPU RAM, then wouldn't the GPU RAM fill up really quickly? Basically I would like to know: when I create a buffer, where is that data stored? In CPU RAM or GPU RAM? Also, what is ID3D11DeviceContext::IASetVertexBuffers doing with the buffer (copying/setting?).

The general answer is that "it's wherever the driver wants it to be". For "DYNAMIC" resources, they are typically put into memory that is accessible to both the CPU and the GPU (on modern PCs this is shared across the PCIe bus). For "STATIC" resources, they can be on video RAM that is only accessible by the GPU which is copied via the 'shared' memory window, or if there's limited space they are put in the 'shared' memory window. Render Targets are usually put in video RAM as well.
For a deeper dive on Direct3D video memory management, check out the talk "Why Your Windows Game Won't Run In 2,147,352,576?" which is no longer on MSDN Downloads but can be found on my blog.
If you want the nitty-gritty hardware details, read the driver writer's documentation.
You may also find the Video Memory sample on MSDN Code Gallery educational.

Related

Does cudaMallocManaged() create a synchronized buffer in RAM and VRAM?

In an Nvidia developer blog: An Even Easier Introduction to CUDA the writer explains:
To compute on the GPU, I need to allocate memory accessible by the
GPU. Unified Memory in CUDA makes this easy by providing a single
memory space accessible by all GPUs and CPUs in your system. To
allocate data in unified memory, call cudaMallocManaged(), which
returns a pointer that you can access from host (CPU) code or device
(GPU) code.
I found this both interesting (since it seems potentially convenient) and confusing:
returns a pointer that you can access from host (CPU) code or device
(GPU) code.
For this to be true, it seems like cudaMallocManaged() must be syncing 2 buffers across VRAM and RAM. Is this the case? Or is my understanding lacking?
In my work so far with GPU acceleration on top of the WebGL abstraction layer via GPU.js, I learned the distinct performance difference between passing VRAM based buffers (textures in WebGL) from kernel to kernel (keeping the buffer on the GPU, highly performant) and retrieving the buffer value outside of the kernels to access it in RAM through JavaScript (pulling the buffer off the GPU, taking a performance hit since buffers in VRAM on the GPU don't magically move to RAM).
Forgive my highly abstracted understanding / description of the topic, since I know most CUDA / C++ devs have a much more granular understanding of the process.
So is cudaMallocManaged() creating synchronized buffers in both RAM
and VRAM for convenience of the developer?
If so, wouldn't doing so come with an unnecessary cost in cases where
we might never need to touch that buffer with the CPU?
Does the compiler perhaps just check if we ever reference that buffer
from CPU and never create the CPU side of the synced buffer if it's
not needed?
Or do I have it all wrong? Are we not even talking VRAM? How does
this work?
So is cudaMallocManaged() creating synchronized buffers in both RAM and VRAM for convenience of the developer?
Yes, more or less. The "synchronization" is referred to in the managed memory model as migration of data. Virtual address carveouts are made for all visible processors, and the data is migrated (i.e. moved to, and provided a physical allocation for) the processor that attempts to access it.
If so, wouldn't doing so come with an unnecessary cost in cases where we might never need to touch that buffer with the CPU?
If you never need to touch the buffer on the CPU, then what will happen is that the VA carveout will be made in the CPU VA space, but no physical allocation will be made for it. When the GPU attempts to actually access the data, it will cause the allocation to "appear" and use up GPU memory. Although there are "costs" to be sure, there is no usage of CPU (physical) memory in this case. Furthermore, once instantiated in GPU memory, there should be no ongoing additional cost for the GPU to access it; it should run at "full" speed. The instantiation/migration process is a complex one, and what I am describing here is what I would consider the "principal" modality or behavior. There are many factors that could affect this.
Does the compiler perhaps just check if we ever reference that buffer from CPU and never create the CPU side of the synced buffer if it's not needed?
No, this is managed by the runtime, not compile time.
Or do I have it all wrong? Are we not even talking VRAM? How does this work?
No you don't have it all wrong. Yes we are talking about VRAM.
The blog you reference barely touches on managed memory, which is a fairly involved subject. There are numerous online resources to learn more about it. You might want to review some of them. here is one. There are good GTC presentations on managed memory, including here. There is also an entire section of the CUDA programming guide covering managed memory.

If you have consumed all the video ram, will an SDL Texture automatically use normal Ram?

My question is: If you have consumed all the available video ram, and attempt to create a new texture (SDL), will normal ram be used automatically instead of video ram? Or, will you have to attempt to use a surface (SDL), which uses normal ram? In the event you are unable to free the video ram for use for whatever reason.
Driver dependent, software renderer uses system memory obviously. GL based implementations use video memory, what happens when OpenGL runs out of memory is up to the driver, most likely it will end up in system memory.
Technically, you have no guarantee that there even is such a thing as video memory, OpenGL is just supposed to store it in the "most practical location", definition of that depends on the hardware (think hybrid memory, there is no difference in that case).
TL;DR; Yes, textures will be stored where there is space for them.

Where Does Direct3D11 Allocate Resource Objects?

I've been reading up on Direct3D11 a lot (including right here on stack overflow!) and in all my research I haven't been able to conclusively answer this question:
When a Resource object (i.e. a Buffer or a Texture object) is created, for example with pDevice->CreateBuffer(), where is it stored? On system RAM or on the GPU's Video-RAM? Or am I entirely misunderstanding the fundamental nature of what a Resource object is?
Obviously whatever data you populate the resource with (such as vertex and index arrays) is stored wherever you - the programmer - placed it, but once you map that data to the Resource where is it copied? (I'm assuming that, since Resources have to be mapped and unmapped for read/write protection the mapped data is in fact copied, perhaps to VRAM). Moreover, where is the Resource object itself instantiated?
Thanks in advance for any and all help!
During resource creation, your Create* call goes through D3D run time, UMD (user mode graphics driver) & KMD (kernel mode).
The KMD takes care of page table management (it could talk to the OS in the case of shared virtual memory, but lets ignore that). If your resource doesn't use SVM, then it is blitted to the graphics memory.
The graphics memory for processor graphics (such as Intel Iris) is your system RAM. So, its really just a copy from the applications virtual address space to the KMD.
For discrete graphics cards, the data resides on the video RAM of your card. The GPU can work only with data it the video RAM. (I'm not sure how shared virtual memory works for discrete graphics)

Is there a way to read directly from the hard drive to the GPU

In my OpenGL program I read from the header file to find out the geometry size and then malloc an Indice array and a Vertex array which I then pass to the VBO, is it even possible to read directly from the hard drive or is the GPUs memory linked to the computer's RAM only?
The GPU is not directly connected to the system RAM. There's a bus inbetween, in current generation computers a PCI-Express bus. ATA storage has a controller inbetween. There is no direct connection between memories.
BUT there is DMA, which allows certain periphials to directly access the system memory through DMA channels. DMA on PCI-Express also works between periphials. Theoretically a GPU could do DMA to the ATA controller.
Practically this is of little use! Why? Because of filesystems. Even if there was some kind of driver support to let the GPU access a storage periphial directly, it'd still have to do all the filesystem business, which doesn't parallelize to the degree as GPUs are designed for.
Alas regarding your question:
In my OpenGL program I read from the header file to find out the geometry size and then malloc an Indice array and a Vertex array which I then pass to the VBO, is it even possible to read directly from the hard drive or is the GPUs memory linked to the computer's RAM only?
Why no simply memory map those files? That way you avoid allocating a buffer you're first reading into, and passing a memory mapped file pointer to OpenGL does allow the driver to actually perform a DMA transfer between the storage driver buffers and the GPU, which is as close as it gets to your original request. Of course the data on the storage device must be prepared in a format that's suitable for the GPU, otherwise it's of little use. If it requires some preprocessing, the best thing to use is the CPU. GPUs don't like to serialize data.

Where and what is "OpenGL memory" used by VBOs, etc

I am learning how to use VBOs and, as the book says,
"...you can free up CPU memory by moving vertex data to the OpenGL
memory on the GPU."
Well, just exactly what can a GPU handle in this regard? Is it acceptable to assume that the "OpenGL memory" can store the vertex data for millions of polygons? What about the GPU in a mobile device?
While developers are used to having a frame of reference for memory restrictions on a CPU, learning OpenGL is partly challenging because I don't know much about GPUs and what to expect from their hardware. So when I read a vague statement like the above, it makes me nervous.
OpenGL has an abstract device and memory model. And technically in the world of OpenGL there is not CPU and GPU memory, but client and server memory. OpenGL buffer objects live on the server side. Server, that simply means everything the OpenGL driver abstracts away. And the OpenGL driver is perfectly allowed to swap out data from the GPU to the CPU if the GPU memory, which acts like a cache, is not sufficient. Hence what your book states:
"...you can free up CPU memory by moving vertex data to the OpenGL memory on the GPU."
Is not entirely correct, as the data in a OpenGL buffer object may very well reside in CPU memory.
There are minimal requirements in spec, but in general, the amount of GPU memory is quite broadly available information which you certainly noticed when buying your PC (overhyped by sellers). However, as #datenwolf said, you can't really know where the data actually is; all that matters is that you can destroy your temporary buffers.
You should take capabilities of the targetted hardware into account regardless of the technology used.