I have been trying to upload a dynamic texture with Map/Unmap but no luck so far.
Here's the code im working with
D3D11_MAPPED_SUBRESOURCE subResource = {};
ImmediateContext->Map(dx11Texture, 0, D3D11_MAP_WRITE_DISCARD, 0, &subResource);
Memory::copy(subResource.pData, (const void*)desc.DataSet[0], texture->get_width() * texture->get_height() * GraphicsFormatUtils::get_format_size(texture->get_format()));
subResource.RowPitch = texture->get_width() * GraphicsFormatUtils::get_format_size(texture->get_format());
subResource.DepthPitch = 0;
ImmediateContext->Unmap(dx11Texture, 0);
I have created the texture with immutable state and supplying the data upfront, that worked out well but when i try to create it with a dynamic flag and upload the same data my texture shows a noisy visual.
This is the texture with immutable creation flags and updating the data upfront on the texture creation phase.
Immutable texture
This is the texture with dynamic creation flags and updating the data after the texture creation phase with Map/Unmap mehtods.
Dynamic texture
Any input would be appreciated.
When using map, the subResource rowPitch that is returned by the map function is the one that is expected for you to perform the copy (you can notice that you never send it back to the deviceContext, so it's read only).
It is generally a power of 2, for memory alignment purposes.
When you provide initial data in an (immutable or other) texture, this copy operation is hidden from you, but still happens behind the scene, so in that case, you need to perform the pitch test yourself.
The process of copying a dynamic texture is as follow :
int myDataRowPitch =; //width * format size (if you don't pad)
D3D11_MAPPED_SUBRESOURCE subResource = {};
ImmediateContext->Map(dx11Texture, 0, D3D11_MAP_WRITE_DISCARD, 0, &subResource);
if (myDataRowPitch == subResource.RowPitch)
{
//you can do a standard mem copy here
}
else
{
// here you need to copy line per line
}
ImmediateContext->Unmap(dx11Texture, 0);
Related
I am performing view frustum culling and generating draw commands on the GPU in a compute shader and I want to pass the bounding volumes in a SSBO. Currently I am using just a large uniform array but I want to go bigger thus the need to move to a SSBO.
The thing I want to accomplish is something a kin to the AZDO approach of using triple buffering in order to avoid sync issues when updating the SSBO by only updating one third of the buffer while guarding the rest with fences.
Is this possible to combine with the compute shader dispatch or should I just create three different SSBOs and then bind each of them accordingly?
The solution as I currently see it would be to somehow tell the following drawcall to only fetch data in the SSBO from a certain offset (0 * buffer_size, 1 * buffer_size, etc). Is this even possible?
Render loop
/* Fence creation omitted for clarity */
// Cycle round updating different parts of the buffer
const uint32_t buffer_idx = (frame % gl_state.bvb_num_partitions);
uint8_t* ptr = (uint8_t*)gl_state.bvbp + buffer_idx * gl_state.bvb_buffer_size;
std::memcpy(ptr, bounding_volumes.data(), gl_state.bvb_buffer_size);
const uint32_t gl_bv_binding_point = 3; // Shader hard coded
const uint32_t offset = buffer_idx * gl_state.bvb_buffer_size;
glBindBufferRange(GL_SHADER_STORAGE_BUFFER, gl_bv_binding_point, gl_state.bvb, offset, gl_state.bvb_buffer_size);
// OLD WAY: glUniform4fv(glGetUniformLocation(gl_state.cull_shader.gl_program, "spheres"), NUM_OBJECTS, &bounding_volumes[0].pos.x);
glUniform4fv(glGetUniformLocation(gl_state.cull_shader.gl_program, "frustum_planes"), 6, glm::value_ptr(frustum[0]));
glDispatchCompute(NUM_OBJECTS, 1, 1);
glMemoryBarrier(GL_COMMAND_BARRIER_BIT | GL_SHADER_STORAGE_BARRIER_BIT); // Buffer objects affected by this bit are derived from the GL_DRAW_INDIRECT_BUFFER binding.
Bounding volume SSBO creation
// Bounding volume buffer
glGenBuffers(1, &gl_state.bvb);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, gl_state.bvb);
gl_state.bvb_buffer_size = NUM_OBJECTS * sizeof(BoundingVolume);
gl_state.bvb_num_partitions = 3; // 1 for application, 1 for OpenGL driver, 1 for GPU
GLbitfield flags = GL_MAP_COHERENT_BIT | GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT;
glBufferStorage(GL_SHADER_STORAGE_BUFFER, gl_state.bvb_num_partitions * gl_state.bvb_buffer_size, nullptr, flags);
gl_state.bvbp = glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, gl_state.bvb_buffer_size * gl_state.bvb_num_partitions, flags);
I'm currently attempting to connect some form of output from a CUDA program to a GL_TEXTURE_2D for use in rendering. I'm not that worried about the output type from CUDA (whether it'd be an array or surface, I can adapt the program to that).
So the question is, how would I do that? (my current code copies the output array to system memory, and uploads it to the GPU again with GL.TexImage2D, which is obviously highly inefficient - when I disable those two pieces of code, it goes from approximately 300 kernel executions per second to a whopping 400)
I already have a little bit of test code, to at least bind a GL texture to CUDA, but I'm not even able to get the device pointer from it...
ctx = CudaContext.CreateOpenGLContext(CudaContext.GetMaxGflopsDeviceId(), CUCtxFlags.SchedAuto);
uint textureID = (uint)GL.GenTexture(); //create a texture in GL
GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
GL.TexImage2D(TextureTarget.Texture2D, 0, PixelInternalFormat.Rgba, width, height, 0, OpenTK.Graphics.OpenGL.PixelFormat.Rgba, PixelType.UnsignedByte, null); //allocate memory for the texture in GL
CudaOpenGLImageInteropResource resultImage = new CudaOpenGLImageInteropResource(textureID, CUGraphicsRegisterFlags.WriteDiscard, CudaOpenGLImageInteropResource.OpenGLImageTarget.GL_TEXTURE_2D, CUGraphicsMapResourceFlags.WriteDiscard); //using writediscard because the CUDA kernel will only write to this texture
//then, as far as I understood the ManagedCuda example, I have to do the following when I call my kernel
//(done without a CudaGraphicsInteropResourceCollection because I only have one item)
resultImage.Map();
var ptr = resultImage.GetMappedPointer(); //this crashes
kernelSample.Run(ptr); //pass the pointer to the kernel so it knows where to write
resultImage.UnMap();
The following exception is thrown when attempting to get the pointer:
ErrorNotMappedAsPointer: This indicates that a mapped resource is not available for access as a pointer.
What do I need to do to fix this?
And even if this exception can be resolved, how would I solve the other part of my question; that is, how do I work with the acquired pointer in my kernel? Can I use a surface for that? Access it as an arbitrary array (pointer arithmetic)?
Edit:
Looking at this example, apparently I don't even need to map the resource every time I call the kernel, and call the render function. But how would this translate to ManangedCUDA?
Thanks to the example I found, I was able to translate that to ManagedCUDA (after browsing the source code and fiddling around), and I'm happy to announce that this does really improve my samples per second from about 300 to 400 :)
Apparently it is needed to use a 3D array (I haven't seen any overloads in ManagedCUDA using 2D arrays) but that doesn't really matter - I just use a 3D array/texture which is exactly 1 deep.
id = GL.GenTexture();
GL.BindTexture(TextureTarget.Texture3D, id);
GL.TexParameter(TextureTarget.Texture3D, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
GL.TexParameter(TextureTarget.Texture3D, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
GL.TexImage3D(TextureTarget.Texture3D, 0, PixelInternalFormat.Rgba, width, height, 1, 0, OpenTK.Graphics.OpenGL.PixelFormat.Bgra, PixelType.UnsignedByte, IntPtr.Zero); //allocate memory for the texture but do not upload anything
CudaOpenGLImageInteropResource resultImage = new CudaOpenGLImageInteropResource((uint)id, CUGraphicsRegisterFlags.SurfaceLDST, CudaOpenGLImageInteropResource.OpenGLImageTarget.GL_TEXTURE_3D, CUGraphicsMapResourceFlags.WriteDiscard);
resultImage.Map();
CudaArray3D mappedArray = resultImage.GetMappedArray3D(0, 0);
resultImage.UnMap();
CudaSurface surfaceResult = new CudaSurface(kernelSample, "outputSurface", CUSurfRefSetFlags.None, mappedArray); //nothing needs to be done anymore - this call connects the 3D array from the GL texture to a surface reference in the kernel
Kernel code:
surface outputSurface;
__global__ void Sample() {
...
surf3Dwrite(output, outputSurface, pixelX, pixelY, 0);
}
So how can one update values in vertex buffer bound into device object using IASetVertexBuffers method? Also will changing values in this buffer before call to Draw() and Present()? Also will the image be updated according to these new values in buffer?
To update a vertex buffer by the CPU, you must first create a dynamic vertex buffer that allows the CPU to write to it. To do this, call ID3D11Device::CreateBufferwith Usage set to D3D11_USAGE_DYNAMIC and CPUAccessFlags set to D3D11_CPU_ACCESS_WRITE. Example:
D3D11_BUFFER_DESC desc;
ZeroMemory( &desc, sizeof( desc ) );
desc.Usage = D3D11_USAGE_DYNAMIC;
desc.ByteWidth = size;
desc.BindFlags = D3D11_BIND_VERTEX_BUFFER;
desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
d3dDevice->CreateBuffer( &desc, initialVertexData, &vertexBuffer );
Now that you have a dynamic vertex buffer, you can update it using ID3D11DeviceContext::Map and ID3D11DeviceContext::Unmap. Example:
D3D11_MAPPED_SUBRESOURCE resource;
d3dDeviceContext->Map( vertexBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &resource );
memcpy( resource.pData, sourceData, vertexDataSize );
d3dDeviceContext->Unmap( vertexBuffer, 0 );
where sourceData is the new vertex data you want to put into the buffer.
This is one method for updating a vertex buffer where you are uploading a whole new set of vertex data and discarding previous contents. There are also other ways to update a vertex buffer. For example, you could leave the current contents and only modify certain values, or you could update only certain regions of the vertex buffer instead of the whole thing.
Each method will have its own usage and performance characteristics. It all depends on what your data is and how you intend on using it. This NVIDIA presentation gives some advice on the best way to update your buffers for different usages.
Yes, you will want to call this and IASetVertexBuffers before Draw() and Present() to see the updated results for the current frame. You don't necessarily need to update the vertex buffer contents before calling IASetVertexBuffers. Those can be in either order.
Been delving into un-managed DirectX 11 for the first time (bear with me) and there's an issue that, although asked several times over the forums still leaves me with questions.
I am developing as app in which objects are added to the scene over time. On each render loop I want to collect all vertices in the scene and render them reusing a single vertex and index buffer for performance and best practice. My question is regarding the usage of dynamic vertex and index buffers. I haven't been able to fully understand their correct usage when scene content changes.
vertexBufferDescription.Usage = D3D11_USAGE_DYNAMIC;
vertexBufferDescription.BindFlags = D3D11_BIND_VERTEX_BUFFER;
vertexBufferDescription.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
vertexBufferDescription.MiscFlags = 0;
vertexBufferDescription.StructureByteStride = 0;
Should I create the buffers when the scene is initialized and somehow update their content in every frame? If so, what ByteSize should I set in the buffer description? And what do I initialize it with?
Or, should I create it the first time the scene is rendered (frame 1) using the current vertex count as its size? If so, when I add another object to the scene, don't I need to recreate the buffer and changing the buffer description's ByteWidth to the new vertex count? If my scene keeps updating its vertices on each frame, the usage of a single dynamic buffer would loose its purpose this way...
I've been testing initializing the buffer on the first time the scene is rendered, and from there on, using Map/Unmap on each frame. I start by filling in a vector list with all the scene objects and then update the resource like so:
void Scene::Render()
{
(...)
std::vector<VERTEX> totalVertices;
std::vector<int> totalIndices;
int totalVertexCount = 0;
int totalIndexCount = 0;
for (shapeIterator = models.begin(); shapeIterator != models.end(); ++shapeIterator)
{
Model* currentModel = (*shapeIterator);
// totalVertices gets filled here...
}
// At this point totalVertices and totalIndices have all scene data
if (isVertexBufferSet)
{
// This is where it copies the new vertices to the buffer.
// but it's causing flickering in the entire screen...
D3D11_MAPPED_SUBRESOURCE resource;
context->Map(vertexBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &resource);
memcpy(resource.pData, &totalVertices[0], sizeof(totalVertices));
context->Unmap(vertexBuffer, 0);
}
else
{
// This is run in the first frame. But what if new vertices are added to the scene?
vertexBufferDescription.ByteWidth = sizeof(VERTEX) * totalVertexCount;
UINT stride = sizeof(VERTEX);
UINT offset = 0;
D3D11_SUBRESOURCE_DATA resourceData;
ZeroMemory(&resourceData, sizeof(resourceData));
resourceData.pSysMem = &totalVertices[0];
device->CreateBuffer(&vertexBufferDescription, &resourceData, &vertexBuffer);
context->IASetVertexBuffers(0, 1, &vertexBuffer, &stride, &offset);
isVertexBufferSet = true;
}
In the end of the render loop, while keeping track of the buffer position of the vertices for each object, I finally invoke Draw():
context->Draw(objectVertexCount, currentVertexOffset);
}
My current implementation is causing my whole scene to flicker. But no memory leaks. Wonder if it has anything to do with the way I am using the Map/Unmap API?
Also, in this scenario, when would it be ideal to invoke buffer->Release()?
Tips or code sample would be great! Thanks in advance!
At the memcpy into the vertex buffer you do the following:
memcpy(resource.pData, &totalVertices[0], sizeof(totalVertices));
sizeof( totalVertices ) is just asking for the size of a std::vector< VERTEX > which is not what you want.
Try the following code:
memcpy(resource.pData, &totalVertices[0], sizeof( VERTEX ) * totalVertices.size() );
Also you don't appear to calling IASetVertexBuffers when isVertexBufferSet is true. Make sure you do so.
I'm trying to use QGLbuffer to display an image.
Sequence is something like:
initializeGL() {
glbuffer= QGLBuffer(QGLBuffer::PixelUnpackBuffer);
glbuffer.create();
glbuffer.bind();
glbuffer.allocate(image_width*image_height*4); // RGBA
glbuffer.release();
}
// Attempting to write an image directly the graphics memory.
// map() should map the texture into the address space and give me an address in the
// to write directly to but always returns NULL
unsigned char* dest = glbuffer.map(QGLBuffer::WriteOnly); FAILS
MyGetImageFunction( dest );
glbuffer.unmap();
paint() {
glbuffer.bind();
glBegin(GL_QUADS);
glTexCoord2i(0,0); glVertex2i(0,height());
glTexCoord2i(0,1); glVertex2i(0,0);
glTexCoord2i(1,1); glVertex2i(width(),0);
glTexCoord2i(1,0); glVertex2i(width(),height());
glEnd();
glbuffer.release();
}
There aren't any examples of using GLBuffer in this way, it's pretty new
Edit --- for search here is the working solution -------
// Where glbuffer is defined as
glbuffer= QGLBuffer(QGLBuffer::PixelUnpackBuffer);
// sequence to get a pointer into a PBO, write data to it and copy it to a texture
glbuffer.bind(); // bind before doing anything
unsigned char *dest = (unsigned char*)glbuffer.map(QGLBuffer::WriteOnly);
MyGetImageFunction(dest);
glbuffer.unmap(); // need to unbind before the rest of openGL can access the PBO
glBindTexture(GL_TEXTURE_2D,texture);
// Note 'NULL' because memory is now onboard the card
glTexSubImage2D(GL_TEXTURE_2D, 0, 0,0, image_width, image_height, glFormatExt, glType, NULL);
glbuffer.release(); // but don't release until finished the copy
// PaintGL function
glBindTexture(GL_TEXTURE_2D,textures);
glBegin(GL_QUADS);
glTexCoord2i(0,0); glVertex2i(0,height());
glTexCoord2i(0,1); glVertex2i(0,0);
glTexCoord2i(1,1); glVertex2i(width(),0);
glTexCoord2i(1,0); glVertex2i(width(),height());
glEnd();
You should bind the buffer before mapping it!
In the documentation for QGLBuffer::map:
It is assumed that create() has been called on this buffer and that it has been bound to the current context.
In addition to VJovic's comments, I think you are missing a few points about PBOs:
A pixel unpack buffer does not give you a pointer to the graphics texture. It is a separate piece of memory allocated on the graphics card to which you can write to directly from the CPU.
The buffer can be copied into a texture by a glTexSubImage2D(....., 0) call, with the texture being bound as well, which you do not do. (0 is the offset into the pixel buffer). The copy is needed partly because textures have a different layout than linear pixel buffers.
See this page for a good explanation of PBO usages (I used it a few weeks ago to do async texture upload).
create will return false if the GL implementation does not support buffers, or there is no current QGLContext.
bind returns false if binding was not possible, usually because type() is not supported on this GL implementation.
You are not checking if these two functions passed.
I got the same thing, map returns NULL. When I used the following order it is solved.
bool success = mPixelBuffer->create();
mPixelBuffer->setUsagePattern(QGLBuffer::DynamicDraw);
success = mPixelBuffer->bind();
mPixelBuffer->allocate(sizeof(imageData));
void* ptr =mPixelBuffer->map(QGLBuffer::ReadOnly);