I am trying to update my vertex buffer data with the map function in dx. Though it does update the data once, but if i iterate over it the model disappears. i am actually trying to manipulate vertices in real-time by user input and to do so i have to update the vertex buffer every frame while the vertex is selected.
Perhaps this happens because the Map function disables GPU access to the vertices until the Unmap function is called. So if the access is blocked every frame, it kind of makes sense for it to not be able render the mesh. However when i update the vertex every frame and then stop after sometime, theatrically the mesh should show up again, but it doesn't.
i know that the proper way to update data every frame is to use constant buffers, but manipulating vertices with constant buffers might not be a good idea. and i don't think that there is any other way to update the vertex data. i expect dynamic vertex buffers to be able to handle being updated every frame.
D3D11_MAPPED_SUBRESOURCE mappedResource;
ZeroMemory(&mappedResource, sizeof(D3D11_MAPPED_SUBRESOURCE));
// Disable GPU access to the vertex buffer data.
pRenderer->GetDeviceContext()->Map(pVBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
// Update the vertex buffer here.
memcpy((Vertex*)mappedResource.pData + index, pData, sizeof(Vertex));
// Reenable GPU access to the vertex buffer data.
pRenderer->GetDeviceContext()->Unmap(pVBuffer, 0);
As this has been already answered the key issue that you are using Discard (which means you won't be able to retrieve the contents from the GPU), I thought I would add a little in terms of options.
The question I have is whether you require performance or the convenience of having the data in one location?
There are a few configurations you can try.
Set up your Buffer to have both CPU Read and Write Access. This though mean you will be pushing and pulling your buffer up and down the bus. In the end, it also causes performance issues on the GPU such as blocking etc (waiting for the data to be moved back onto the GPU). I personally don't use this in my editor.
If memory is not the issue, set up a copy of your buffer on CPU side, each frame map with Discard and block copy the data across. This is performant, but also memory intensive. You obviously have to manage the data partioning and indexing into this space. I don't use this, but I toyed with it, too much effort!
You bite the bullet, you map to the buffer as per 2, and write each vertex object into the mapped buffer. I do this, and unless the buffer is freaking huge, I havent had issue with it in my own editor.
Use the Computer shader to update the buffer, create a resource view and access view and pass the updates via a constant buffer. Bit of a Sledgehammer to crack a wallnut. And still doesn't stop the fact you may need pull the data back off the GPU ala as per item 1.
There are some variations on managing the buffer, such as interleaving you can play with also (2 copies, one on GPU while the other is being written to) which you can try also. There are some rather ornate mechanisms such as building the content of the buffer in another thread and then flagging the update.
At the end of the day, DX 11 doesn't offer the ability (someone might know better) to edit the data in GPU memory directly, there is alot shifting between CPU and GPU.
Good luck on which ever technique you choose.
Mapping buffer with D3D11_MAP_WRITE_DISCARD flag will cause entire buffer content to become invalid. You can not use it to update just a single vertex. Keep buffer on the CPU side instead and then update entire buffer on GPU side once per frame.
If you develop for UWP - use of map/unmap may result in sync problems. ID3D11DeviceContext methods are not thread safe: https://learn.microsoft.com/en-us/windows/win32/direct3d11/overviews-direct3d-11-render-multi-thread-intro.
If you update buffer from one thread and render from another - you may get different errors. In this case you must use some synchronization mechanism, such as critical sections. Example is here https://developernote.com/2015/11/synchronization-mechanism-in-directx-11-and-xaml-winrt-application/
Related
I am trying to sample a few fragments' depth data that I need to use in my client code (that runs on CPU).
I tried a glReadPixel() on my FrameBuffer Object, but turns out it stalls the render pipeline as it transfers data from Video Memory to Main Memory through the CPU, thus causes unbearable lag (please, correct me if I am wrong).
I read about Pixel Buffer objects, that we can use them as copies of other buffers, and very importantly, perform glReadPixel() operation without stalling the performance, but not without compromising to use outdated information. (That's OK for me.)
But, I am unable to understand about how to use Pixel Buffers.
What I've learnt is we need to sample data from a texture to store it in a PixelBuffer. But I am trying to sample from a Renderbuffer, which I've read is not possible.
So here's my problem - I want to sample the depth information stored in my Render Buffer, store it in RAM, process it and do other stuff, without causing any issues to the Rendering Pipeline. If I use a depth texture instead of a renderbuffer, i don't know how to use it for depth testing.
Is it possible to copy the entire Renderbuffer to the Pixelbuffer and perform read operations on it?
Is there any other way to achieve what I am trying to do?
Thanks!
glReadPixels can also transfer from a framebuffer to a standard GPU side buffer object. If you generate a buffer and bind it to the GL_PIXEL_PACK_BUFFER target, the data pointer argument to glReadPixels is instead an offset into the buffer object. (So probably should be 0 unless you are doing something clever.)
Once you've copied the pixels you need into a buffer object, you can transfer or map or whatever back to the CPU at a time convenient for you.
For some reason I have to copy from textures to buffer and then reload it back to the texture.
The source texture is the one coming from decoder, target texture is the one which will be rendered. The easiest way to do that (as I understand) is to do the following:
decoder tex(ID3D11Texture2D)
Use temp texture (Usage = D3D11_USAGE_STAGING)
CopyResource to temp texture
'Map'
memcpy_s to buffer
Unmap
on the other side it goes backward
Use temp texture (Usage = D3D11_USAGE_STAGING)
Map
memcpy_s from buffer
Unmap
CopyResource to renderer texture
Works fine, however I have a feeling I'm not doing it as efficient as possible (aside the fact I'm copying data back and forth)
Do I have to use staging textures? Can I tweak the decoder/renderer texture flags (BindFlags?) or the Map's D3D11_MAP enumeration to skip copying to staging texture?
EDIT001:
Ok, here goes the case, with technical details. There is a decoder, essentially it is Intel Media SDK decoder which decodes (pun intended) data provided from outside the decoding class. So, it receives a buffer, does its magic(asynchronously) and returns (by means of SyncOperation, If I recall the method name right) a surface, which is actually, under the hood DX texture, managed by the Intel allocator. I receive and copy the texture synchronously, but I guess, with a little effort I can do it asynchronously. The surface originates from a pool, so, working with the texture does not stops the decoder to keep the work on. The copied data resides in a struct which is kept in ring buffer, from which the video renderer is fed. That's it, to my understanding (a little one, I have to notice) there is no harm to the GPU parallelism.
If you have to read and write back, there is no fast way, you break GPU/CPU parallelism by forcing sync point and you will create many idle bubbles on the CPU and GPU.
Only the staging pool is accessible to the CPU, so yes, the temporary resource for the back and forth is necessary.
For performance, you should consider :
Adapt your technique to be GPU only
If read back is the only way, limit to portion that are dirty or neccessary
Try to work with a couple of texture accross frame to let the CPU work on a version that is a few frame behind to protect parallelism.
I have no experience with Direct3D, so I may just be looking in the wrong places. However, I would like to convert a program I have written in OpenGL (using FreeGLUT) to a Windows IoT compatible UWP (running Direct3D, 12 'caus it's cool). I'm trying to port my program to a Raspberry Pi 3 and I don't want to convert to Linux.
Through the examples provided by Microsoft I have figured out most of what I believe I need to know to get started, but I can't figure out how to share a dynamic data buffer between the CPU and GPU.
What I want to know how to do:
Create a CPU/GPU shared circular buffer
Read and Draw with the GPU
Write / Replace sections with the CPU
Quick semi-pseudo code:
while (!buffer.inUse()){ //wait until buffer is not in use
updateBuffer(buffer.id, data, start, end); //insert data into buffer
drawToScreen(buffer.id); //draw using vertex data in buffer
}
This was previously done in OpenGL by simply using glBegin()/glEnd() and glVertex3f() for each value in an array when it wasn't being written to.
Update: I basically want a Direct3D12 equivalent of OpenGLs VBO editing using glBufferSubData(). If that makes more sense.
Update 2: I found that I can get away with discarding the vertex buffer every frame and re-uploading a new buffer to the GPU. There's a fair amount of overhead, as one would expect with transferring 10,000 - 200,000 doubles every frame. So I'm trying to find a way to use constant buffers to port the 5-10 updated vertexes into the shader, so I can copy from the constant buffer into the vertex buffer using the shader and not have to use map/unmap every frame. This way my circular buffer on the CPU is independent of the buffer being used on the GPU, but they will both share the same information through periodic updates. I'll do some more looking and post another more specific question on shaders if I don't find a solution.
I'm using OpenGL to implement some kind of batched drawing. For this I create a vertex buffer to store data.
Note: this buffer generally will update on each frame, but will never decrease size (but still can increase).
My question is: is it technically correct to use glBufferData (with streaming write-only mode) for updating it (instead of e.g. glMapBuffer)? I suppose there's no need to map it, since full data is updated, so I just send a full pack at once. And if the current buffer size is less, than I'm sending, it will automatically increase, won't it? I'm just now sure about the way it really works (maybe it will recreate buffer on each call, no?).
It would be better to have buffer with fixed size and do not recreate it every frame.
You can achieve this by:
creating buffer with max size, for instance space for 1000 verts
update only the beginning of the buffer with new data. So if you changed data for 500 verts then fill ony first half of the buffer using glMapBuffer
change count of drawn vertices when drawing. You can, for instance, use only some range of verts (eg. from 200 to 500) from the whole 1000 verts buffer. Use glDrawArrays(mode, first, count)
ideas from comments:
glMapBufferRange and glBufferSubData could also help
also consider double buffering of buffers
link: http://hacksoflife.blogspot.com/2010/02/double-buffering-vbos.html
hope that helps
In addition to what fen and datenwolf said, see Chapter 22 of OpenGL Insights; in particular, it includes timings for a variety of hardware & techniques.
I am developing an application that needs to read back the whole frame from the front buffer of an openGL application. I can hijack the application's opengl library and insert my code on swapbuffers. At the moment I am successfully using a simple but excruciating slow glReadPixels command without PBO's.
Now I read about using multiple PBO's to speed things up. While I think I've found enough resources to actually program that (isn't that hard), I have some operational questions left. I would do something like this:
create a series (e.g. 3) of PBO's
use glReadPixels in my swapBuffers override to read data from front buffer to a PBO (should be fast and non-blocking, right?)
Create a seperate thread to call glMapBufferARB, once per PBO after a glReadPixels, because this will block until the pixels are in client memory.
Process the data from step 3.
Now my main concern is of course in steps 2 and 3. I read about glReadPixels used on PBO's being non-blocking, will this be an issue if I issue new opengl commands after that very fast? Will those opengl commands block? Or will they continue (my guess), and if so, I guess only swapbuffers can be a problem, will this one stall or will glReadPixels from front buffer be many times faster than swapping (about each 15->30ms) or, worst case scenario, will swapbuffers be executed while glReadPixels is still reading data to the PBO? My current guess is this logic will do something like this: copy FRONT_BUFFER -> generic place in VRAM, copy VRAM->RAM. But I have no idea which of those 2 is the real bottleneck and more, what the influence on the normal opengl command stream is.
Then in step 3. Is it wise to do this asynchronously in a thread separated from normal opengl logic? At the moment I think not, It seems you have to restore buffer operations to normal after doing this and I can't install synchronization objects in the original code to temporarily block those. So I think my best option is to define a certain swapbuffer delay before reading them out, so e.g. calling glReadPixels on PBO i%3 and glMapBufferARB on PBO (i+2)%3 in the same thread, resulting in a delay of 2 frames. Also, when I call glMapBufferARB to use data in client memory, will this be the bottleneck or will glReadPixels (asynchronously) be the bottleneck?
And finally, if you have some better ideas to speed up frame readback from GPU in opengl, please tell me, because this is a painful bottleneck in my current system.
I hope my question is clear enough, I know the answer will probably also be somewhere on the internet but I mostly came up with results that used PBO's to keep buffers in video memory and do processing there. I really need to read back the front buffer to RAM and I do not find any clear explanations about performance in that case (which I need, I cannot rely on "it's faster", I need to explain why it's faster).
Thank you
Are you sure you want to read from the front buffer? You do not own this buffer, and depending on your OS it might be destroyed, e.g., by another window on top of it.
For your use case, people typically do
draw N
start PBO read N from back buffer
draw N+1
start PBO read N+1
sync PBO read N
process N
...
from a single thread.