CPU/GPU Shared Buffer in Direct3D12 - c++

I have no experience with Direct3D, so I may just be looking in the wrong places. However, I would like to convert a program I have written in OpenGL (using FreeGLUT) to a Windows IoT compatible UWP (running Direct3D, 12 'caus it's cool). I'm trying to port my program to a Raspberry Pi 3 and I don't want to convert to Linux.
Through the examples provided by Microsoft I have figured out most of what I believe I need to know to get started, but I can't figure out how to share a dynamic data buffer between the CPU and GPU.
What I want to know how to do:
Create a CPU/GPU shared circular buffer
Read and Draw with the GPU
Write / Replace sections with the CPU
Quick semi-pseudo code:
while (!buffer.inUse()){ //wait until buffer is not in use
updateBuffer(buffer.id, data, start, end); //insert data into buffer
drawToScreen(buffer.id); //draw using vertex data in buffer
}
This was previously done in OpenGL by simply using glBegin()/glEnd() and glVertex3f() for each value in an array when it wasn't being written to.
Update: I basically want a Direct3D12 equivalent of OpenGLs VBO editing using glBufferSubData(). If that makes more sense.
Update 2: I found that I can get away with discarding the vertex buffer every frame and re-uploading a new buffer to the GPU. There's a fair amount of overhead, as one would expect with transferring 10,000 - 200,000 doubles every frame. So I'm trying to find a way to use constant buffers to port the 5-10 updated vertexes into the shader, so I can copy from the constant buffer into the vertex buffer using the shader and not have to use map/unmap every frame. This way my circular buffer on the CPU is independent of the buffer being used on the GPU, but they will both share the same information through periodic updates. I'll do some more looking and post another more specific question on shaders if I don't find a solution.

Related

How to sample Renderbuffer depth information and process it in CPU code, without causing an impact on performance?

I am trying to sample a few fragments' depth data that I need to use in my client code (that runs on CPU).
I tried a glReadPixel() on my FrameBuffer Object, but turns out it stalls the render pipeline as it transfers data from Video Memory to Main Memory through the CPU, thus causes unbearable lag (please, correct me if I am wrong).
I read about Pixel Buffer objects, that we can use them as copies of other buffers, and very importantly, perform glReadPixel() operation without stalling the performance, but not without compromising to use outdated information. (That's OK for me.)
But, I am unable to understand about how to use Pixel Buffers.
What I've learnt is we need to sample data from a texture to store it in a PixelBuffer. But I am trying to sample from a Renderbuffer, which I've read is not possible.
So here's my problem - I want to sample the depth information stored in my Render Buffer, store it in RAM, process it and do other stuff, without causing any issues to the Rendering Pipeline. If I use a depth texture instead of a renderbuffer, i don't know how to use it for depth testing.
Is it possible to copy the entire Renderbuffer to the Pixelbuffer and perform read operations on it?
Is there any other way to achieve what I am trying to do?
Thanks!
glReadPixels can also transfer from a framebuffer to a standard GPU side buffer object. If you generate a buffer and bind it to the GL_PIXEL_PACK_BUFFER target, the data pointer argument to glReadPixels is instead an offset into the buffer object. (So probably should be 0 unless you are doing something clever.)
Once you've copied the pixels you need into a buffer object, you can transfer or map or whatever back to the CPU at a time convenient for you.

How to update vertex buffer data frequently in DirectX 11?

I am trying to update my vertex buffer data with the map function in dx. Though it does update the data once, but if i iterate over it the model disappears. i am actually trying to manipulate vertices in real-time by user input and to do so i have to update the vertex buffer every frame while the vertex is selected.
Perhaps this happens because the Map function disables GPU access to the vertices until the Unmap function is called. So if the access is blocked every frame, it kind of makes sense for it to not be able render the mesh. However when i update the vertex every frame and then stop after sometime, theatrically the mesh should show up again, but it doesn't.
i know that the proper way to update data every frame is to use constant buffers, but manipulating vertices with constant buffers might not be a good idea. and i don't think that there is any other way to update the vertex data. i expect dynamic vertex buffers to be able to handle being updated every frame.
D3D11_MAPPED_SUBRESOURCE mappedResource;
ZeroMemory(&mappedResource, sizeof(D3D11_MAPPED_SUBRESOURCE));
// Disable GPU access to the vertex buffer data.
pRenderer->GetDeviceContext()->Map(pVBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
// Update the vertex buffer here.
memcpy((Vertex*)mappedResource.pData + index, pData, sizeof(Vertex));
// Reenable GPU access to the vertex buffer data.
pRenderer->GetDeviceContext()->Unmap(pVBuffer, 0);
As this has been already answered the key issue that you are using Discard (which means you won't be able to retrieve the contents from the GPU), I thought I would add a little in terms of options.
The question I have is whether you require performance or the convenience of having the data in one location?
There are a few configurations you can try.
Set up your Buffer to have both CPU Read and Write Access. This though mean you will be pushing and pulling your buffer up and down the bus. In the end, it also causes performance issues on the GPU such as blocking etc (waiting for the data to be moved back onto the GPU). I personally don't use this in my editor.
If memory is not the issue, set up a copy of your buffer on CPU side, each frame map with Discard and block copy the data across. This is performant, but also memory intensive. You obviously have to manage the data partioning and indexing into this space. I don't use this, but I toyed with it, too much effort!
You bite the bullet, you map to the buffer as per 2, and write each vertex object into the mapped buffer. I do this, and unless the buffer is freaking huge, I havent had issue with it in my own editor.
Use the Computer shader to update the buffer, create a resource view and access view and pass the updates via a constant buffer. Bit of a Sledgehammer to crack a wallnut. And still doesn't stop the fact you may need pull the data back off the GPU ala as per item 1.
There are some variations on managing the buffer, such as interleaving you can play with also (2 copies, one on GPU while the other is being written to) which you can try also. There are some rather ornate mechanisms such as building the content of the buffer in another thread and then flagging the update.
At the end of the day, DX 11 doesn't offer the ability (someone might know better) to edit the data in GPU memory directly, there is alot shifting between CPU and GPU.
Good luck on which ever technique you choose.
Mapping buffer with D3D11_MAP_WRITE_DISCARD flag will cause entire buffer content to become invalid. You can not use it to update just a single vertex. Keep buffer on the CPU side instead and then update entire buffer on GPU side once per frame.
If you develop for UWP - use of map/unmap may result in sync problems. ID3D11DeviceContext methods are not thread safe: https://learn.microsoft.com/en-us/windows/win32/direct3d11/overviews-direct3d-11-render-multi-thread-intro.
If you update buffer from one thread and render from another - you may get different errors. In this case you must use some synchronization mechanism, such as critical sections. Example is here https://developernote.com/2015/11/synchronization-mechanism-in-directx-11-and-xaml-winrt-application/

Why is using multiple Pixel buffer Objects advised. Surely it is redundant?

This article is commonly referenced when anyone asks about video streaming textures in OpenGL.
It says:
To maximize the streaming transfer performance, you may use multiple pixel buffer objects. The diagram shows that 2 PBOs are used simultaneously; glTexSubImage2D() copies the pixel data from a PBO while the texture source is being written to the other PBO.
For nth frame, PBO 1 is used for glTexSubImage2D() and PBO 2 is used to get new texture source. For n+1th frame, 2 pixel buffers are switching the roles and continue to update the texture. Because of asynchronous DMA transfer, the update and copy processes can be performed simultaneously. CPU updates the texture source to a PBO while GPU copies texture from the other PBO.
They provide a simple bench-mark program which allows you to cycle between texture updates without PBO's, with a single PBO, and with two PBO's used as described above.
I see a slight performance improvement when enabling one PBO.
But the second PBO makes no real difference.
Right before the code glMapBuffer's the PBO, it calls glBufferData with the pointer set to NULL. It does this to avoid a sync-stall.
// map the buffer object into client's memory
// Note that glMapBufferARB() causes sync issue.
// If GPU is working with this buffer, glMapBufferARB() will wait(stall)
// for GPU to finish its job. To avoid waiting (stall), you can call
// first glBufferDataARB() with NULL pointer before glMapBufferARB().
// If you do that, the previous data in PBO will be discarded and
// glMapBufferARB() returns a new allocated pointer immediately
// even if GPU is still working with the previous data.
So, Here is my question...
Doesn't this make the second PBO completely useless? Just a waste of memory !?
With two PBO's the texture data is stored 3 times. 1 in the texture, and one in each PBO.
With a single PBO. There are two copies of the data. And temporarily only a 3rd in the event that glMapBuffer creates a new buffer because the existing one is presently being DMA'ed to the texture?
The comments seem to suggest that OpenGL drivers internally are capable to creating the second buffer IF and only WHEN it is required to avoid stalling the pipeline. The in-use buffer is being DMA'ed, and my call to map yields a new buffer for me to write to.
The Author of that article appears to be more knowledgeable in this area than myself. Have I completely mis-understood the point?
Answering my own question... But I wont accept it as an answer... (YET).
There are many problems with the benchmark program linked to in the question. It uses immediate mode. It uses GLUT!
The program was spending most of its time doing things we are not interested in profiling. Mainly rendering text via GLUT, and writing pretty stripes to the texture. So I have removed those functions.
I cranked the texture resultion up to 8K, and added more PBO Modes.
NO PBO (yeilds 6fps)
1 PBO. Orphan previous buffer. (yields 12.2 fps).
2 PBO's. Orpha previous buffer. (yields 12.2 fps).
1 PBO. DONT orphan previous PBO (possible stall - added by myself. yields 12.4 fps).
2 PBO's. DONT orphan previous PBO (possible stall - added by myself. yields 12.4 fps).
If anyone else would like to examine my code, it is vailable here
I have experimented with different texture sizes... and different updatePixels functions... I cannot, despite my best efforts get the double PBO implementation to perform any better than the single-PBO implementation.
Furthermore... NOT orphanning the previous buffer, actually vields better performance. Exactly opposite to what the article claims.
Perhaps modern drivers / hardware does not suffer the problem that this design is attemtping to fix...
Perhaps my graphics hardware / driver is buggy, and not taking advantage of the double-PBO...
Perhaps the commonly referenced article is completely wrong?
Who knows. . . .
My test hardware is Intel(R) HD Graphics 5500 (Broadwell GT2).

Opengl increase in speed after few seconds

I am developing a rendering engine with OpenGL as base renderer.
The renderer start with 150 fps in beginning and after 30 seconds or so the fps increases to 500.
I have timed each part of the engine separately and the only part that increase in speed is the drawMesh function which binds the [static]VBOs and calls the glDrawArrays.
I have also commented the glPush and glGet functions with the same behavior as result.
This happens every time i run the engine, even when the camera is not moved and remains rendering the exact same scene.
Does anyone has any idea how this can be happening?
The problem
The problem arises from the VBO being mapped to a buffer after being created. The model class does this once to update its boundaries; and in case of particles to update the buffer with required data.
As it seems the video-card (or at least in my case having a Geforce GTS 450) does not copy the data back into the video-card directly after unmapping the VBO, specially when using the GL_READ_WRITE_ARB flag for mapping the buffer. it will keep the data in external RAM for few seconds before copying the data back into the VRAM.
The solution
By using the GL_READ_ONLY_ARB flag to map the data which are supposed to only read the data, the buffer will get copied back into the VRAM almost directly. However in my case it would be much more efficient to calculate the boundaries during the mesh conversation and not accessing the data at all once the VBO is created for such purposes.
Maybe it's because shaders are compiled just-in-time before first usage.
Take a look at GL_ARB_get_program_binary
Also, try rendering a triangle ( can probably do this off screen) with your shaders as you load them during the initialization phase.

OpenGL. Updating vertex buffer with glBufferData

I'm using OpenGL to implement some kind of batched drawing. For this I create a vertex buffer to store data.
Note: this buffer generally will update on each frame, but will never decrease size (but still can increase).
My question is: is it technically correct to use glBufferData (with streaming write-only mode) for updating it (instead of e.g. glMapBuffer)? I suppose there's no need to map it, since full data is updated, so I just send a full pack at once. And if the current buffer size is less, than I'm sending, it will automatically increase, won't it? I'm just now sure about the way it really works (maybe it will recreate buffer on each call, no?).
It would be better to have buffer with fixed size and do not recreate it every frame.
You can achieve this by:
creating buffer with max size, for instance space for 1000 verts
update only the beginning of the buffer with new data. So if you changed data for 500 verts then fill ony first half of the buffer using glMapBuffer
change count of drawn vertices when drawing. You can, for instance, use only some range of verts (eg. from 200 to 500) from the whole 1000 verts buffer. Use glDrawArrays(mode, first, count)
ideas from comments:
glMapBufferRange and glBufferSubData could also help
also consider double buffering of buffers
link: http://hacksoflife.blogspot.com/2010/02/double-buffering-vbos.html
hope that helps
In addition to what fen and datenwolf said, see Chapter 22 of OpenGL Insights; in particular, it includes timings for a variety of hardware & techniques.