How to link descriptor table with multiple range to the compute shader - hlsl

How to link one UAV descriptor table with a multiple ranges(u0,u1,u2) to the compute Shader
I can make a buffer for each of them and it would work but, I can't seem to figure out how to make it work using multiple ranges.
SetComputeRootDescriptorTable(RootParameterIndex,BaseDescriptor) Takes only these two inputs, This is how I attempted to compute it, so naturally I though I will link the three descriptors as commented below, but the app crash, so I commented the code that cause the crash.
ThrowIfFailed(m_pCommandList->Reset(m_pCommandAllocator.Get(), m_pPipelineState["CSPSO"].Get()));
D3D12_GPU_DESCRIPTOR_HANDLE pComputeGPUHeapOffset= m_pCbvSrvDescriptorHeap->GetGPUDescriptorHandleForHeapStart();
ID3D12DescriptorHeap* ppHeaps[] = { m_pCbvSrvDescriptorHeap.Get() };
m_pCommandList->SetDescriptorHeaps(1, ppHeaps);
CD3DX12_GPU_DESCRIPTOR_HANDLE desc= CD3DX12_GPU_DESCRIPTOR_HANDLE(pComputeGPUHeapOffset, 0, m_nCbvSrvDescriptorSize);
m_pCommandList->SetComputeRootDescriptorTable(0, desc);
//uncommenting those causes crash even though each point to a different descriptor
//desc =CD3DX12_GPU_DESCRIPTOR_HANDLE(pComputeGPUHeapOffset, 1, m_nCbvSrvDescriptorSize);
//m_pCommandList->SetComputeRootDescriptorTable(0, desc);
//desc = CD3DX12_GPU_DESCRIPTOR_HANDLE(pComputeGPUHeapOffset, 2, m_nCbvSrvDescriptorSize);
//m_pCommandList->SetComputeRootDescriptorTable(0, desc);
m_pCommandList->Dispatch(1, 1,1);
Now This is written the rest of the code, First to define the descriptor table
CD3DX12_ROOT_SIGNATURE_DESC rootSigDesc(slotRootParameter.size(),,
slotRootParameter[0].InitAsDescriptorTable(1, &uavTable, D3D12_SHADER_VISIBILITY_ALL);
Then to create the main buffer
&CD3DX12_RESOURCE_DESC::Buffer(paddedSize + sizeof(UINT),
and the description for each section of the buffer that I want to link to U0,U1,U2 defined in the descriptor table as
uavDesc.ViewDimension = D3D12_UAV_DIMENSION_BUFFER;
uavDesc.Buffer.FirstElement = 0;
uavDesc.Buffer.NumElements = 3000;
uavDesc.Buffer.CounterOffsetInBytes = paddedSize;//aligned to 4096
uavDesc.Buffer.Flags = D3D12_BUFFER_UAV_FLAG_NONE;
uavDesc.Buffer.StructureByteStride = sizeof(Object1);
CD3DX12_CPU_DESCRIPTOR_HANDLE hDescriptor(m_pCbvSrvDescriptorHeap->GetCPUDescriptorHandleForHeapStart());
uavDesc.Buffer.NumElements = 2000;
uavDesc.Buffer.FirstElement = 3000;
hDescriptor.Offset(1, m_nCbvSrvDescriptorSize);
uavDesc.Buffer.StructureByteStride = sizeof(Object2);
uavDesc.Buffer.FirstElement = 5000;
hDescriptor.Offset(1, m_nCbvSrvDescriptorSize);
uavDesc.Buffer.StructureByteStride = sizeof(float);
The paddedSize is the aligned sum of objects sizes multiplied by their buffer length
and a shader that output any values with structures like
RWStructuredBuffer <Object1> FirstBuffer:register(u0);
RWStructuredBuffer<Object2> SecondBuffer:register(u1);
RWStructuredBuffer<float> ThirdBuffer:register(u2);
Well when I pass dummy data in the first buffer I can see that the buffer is empty and when I output anything from the buffer after the shader is done it is also empty.
So what am I doing wrong?

CD3DX12_GPU_DESCRIPTOR_HANDLE desc= CD3DX12_GPU_DESCRIPTOR_HANDLE(pComputeGPUHeapOffset, 0, m_nCbvSrvDescriptorSize);
m_pCommandList->SetComputeRootDescriptorTable(0, desc);
//uncommenting those causes crash even though each point to a different descriptor
//desc =CD3DX12_GPU_DESCRIPTOR_HANDLE(pComputeGPUHeapOffset, 1, m_nCbvSrvDescriptorSize);
//m_pCommandList->SetComputeRootDescriptorTable(1, desc);
//desc = CD3DX12_GPU_DESCRIPTOR_HANDLE(pComputeGPUHeapOffset, 2, m_nCbvSrvDescriptorSize);
//m_pCommandList->SetComputeRootDescriptorTable(2, desc);
m_pCommandList->Dispatch(1, 1,1);
should give different indexes to root desc table enty in command list.


What is the best way to clear a `VkImage` to a single color?

I'm learning vulkan, and as a (very) simple project I want to simply clear a single swapchain image to a single color (red). My code works, but I get two validation errors. I would like to know:
How can I fix the validation errors in my code
Is there a better way to simply clear swapchain images
Regarding (2): I specifically don't want to use a graphics pipeline: in the future I would like to use a compute shader to draw directly to the screen.
My current approach
My project uses vk-bootstrap to set up, and then I try to render a single frame as follows:
Acquire an image from the swapchain
Record a command buffer with the following commands:
Submit the command buffer to the graphics queue
present the previously acquired swapchain image using vkQueuePresentKHR
The relevant code can be found below, but it seems that the validation errors arise from the calls to vkCmdClearColorImage and vkQueuePresentKHR.
Error messages
The first validation error is from the vkCmdClearColorImage call, and seems to be triggered by my choice of layout:
// (... snip ...)
vkCmdClearColorImage(commandBuffer, swapChainImages[nextImageIndex], layout, &color, 1, &imageSubresourceRange);
The error message says:
[ERROR: Validation]
Validation Error: [ VUID-vkCmdClearColorImage-imageLayout-00005 ] Object 0: handle = 0xf56c9b0000000004, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x9740ed23 | vkCmdClearColorImage(): Layout for cleared image is VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR but can only be TRANSFER_DST_OPTIMAL or GENERAL. The Vulkan spec states: imageLayout must be VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL or VK_IMAGE_LAYOUT_GENERAL (
I'm confused by this message, because although the link in this error message indeed says that
I also found this page in the spec that says
imageLayout specifies the current layout of the image subresource ranges to be cleared, and must be VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR, VK_IMAGE_LAYOUT_GENERAL or VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL
The second error is triggered by the call to vkQueuePresentKHR and is especially weird...
[ERROR: Validation]
Validation Error: [ VUID-VkPresentInfoKHR-pImageIndices-01296 ] Object 0: handle = 0x55a3c6b9e408, type = VK_OBJECT_TYPE_QUEUE; | MessageID = 0xc7aabc16 | vkQueuePresentKHR(): pSwapchains[0] images passed to present must be in layout VK_IMAGE_LAYOUT_PRESENT_SRC_KHR or VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR but is in VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR. The Vulkan spec states: Each element of pImageIndices must be the index of a presentable image acquired from the swapchain specified by the corresponding element of the pSwapchains array, and the presented image subresource must be in the VK_IMAGE_LAYOUT_PRESENT_SRC_KHR layout at the time the operation is executed on a VkDevice (
... because the message seems to contradict itself (linebreaks added):
images passed to present must be in layout VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
Code for step (2)
VkCommandBufferBeginInfo beginInfo{};
beginInfo.flags = 0;
beginInfo.pInheritanceInfo = nullptr;
if(vkBeginCommandBuffer(commandBuffer, &beginInfo) != VK_SUCCESS) {
throw std::runtime_error("failed vkBeginCommandBuffer");
VkImageMemoryBarrier barrier{};
barrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.image = swapChainImages[nextImageIndex];
barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
barrier.subresourceRange.baseMipLevel = 0;
barrier.subresourceRange.levelCount = 1;
barrier.subresourceRange.baseArrayLayer = 0;
barrier.subresourceRange.layerCount = 1;
0, nullptr,
0, nullptr,
1, &barrier);
VkClearColorValue color = { .float32 = {1.0, 0.0, 0.0} };
VkImageSubresourceRange imageSubresourceRange { VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 };
vkCmdClearColorImage(commandBuffer, swapChainImages[nextImageIndex], layout, &color, 1, &imageSubresourceRange);
The Question (again)
So to repeat: I would like to know:
How can I fix the two validation errors in my code?
Is there a better way to simply clear swapchain images?
Software Versions
The vulkaninfo command reports Vulkan Instance Version 1.2.194
I found a solution which works but seems kinda gross. Basically I modify step (2) to the following:
Record a command buffer with the following commands:
Essentially the logical flow of this pipeline is:
Clear image using vkCmdClearColorImage
Using a barrier, convert the image to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR so it can be presented
This works because:
vkCmdClearColorImage requires the image to have layout VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, but
vkQueuePresent requires the image to have layout VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
Since this seems a little hacky / gross, I will leave this question open for a while to see if anyone has a better solution. For completeness, here is the new code for step (2)
Modified code for step (2)
VkCommandBufferBeginInfo beginInfo{};
beginInfo.flags = 0;
beginInfo.pInheritanceInfo = nullptr;
if(vkBeginCommandBuffer(commandBuffer, &beginInfo) != VK_SUCCESS) {
throw std::runtime_error("failed vkBeginCommandBuffer");
VkImageMemoryBarrier barrier{};
barrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
barrier.newLayout = clearLayout;
barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.image = swapChainImages[nextImageIndex];
barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
barrier.subresourceRange.baseMipLevel = 0;
barrier.subresourceRange.levelCount = 1;
barrier.subresourceRange.baseArrayLayer = 0;
barrier.subresourceRange.layerCount = 1;
0, nullptr,
0, nullptr,
1, &barrier);
VkImageLayout layout = clearLayout;
VkClearColorValue color = { .float32 = {1.0, 0.0, 0.0} };
VkImageSubresourceRange imageSubresourceRange { VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 };
vkCmdClearColorImage(commandBuffer, swapChainImages[nextImageIndex], layout, &color, 1, &imageSubresourceRange);
// Add another barrier and put the image in VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
VkImageLayout finalLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
VkImageMemoryBarrier finalBarrier{};
finalBarrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
finalBarrier.newLayout = finalLayout;
finalBarrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
finalBarrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
finalBarrier.image = swapChainImages[nextImageIndex];
finalBarrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
finalBarrier.subresourceRange.baseMipLevel = 0;
finalBarrier.subresourceRange.levelCount = 1;
finalBarrier.subresourceRange.baseArrayLayer = 0;
finalBarrier.subresourceRange.layerCount = 1;
0, nullptr,
0, nullptr,
1, &finalBarrier);

Adding an extra UBO to a vulkan pipeline stops all geometry rendering

I've followed the tutorial at and I'm trying to split the Uniform buffer into 2 seperate buffers, one for View and Projection and one for Model. I've found however once I add another buffer to the layout, even if my shaders don't use it's content, no geometry is rendered. I don't get anything from the validation layers.
I've found that if the two UBOs are the same buffer, I have no problem. But if I assign them to different buffers, nothing appears on the screen. Have added descriptor set generation code.
Here's my layout generation code. All values are submitted correctly, bindings are 0, 1 and 2 respectively and this is reflected in shader code. I'm currently not even using the data in the buffer in the shader - so it's got nothing to do with the data I'm actually putting in the buffer.
Edit: Have opened up in RenderDoc. Without the extra buffer, I can see the normal VP buffer and it's values. They look fine. If I add in the extra buffer, it does not show up, but also the data from the first buffer is all zeroes.
Descriptor Set Layout generation:
std::vector<VkDescriptorSetLayoutBinding> layoutBindings;
newShader->features includes 3 "features", with bindings 0,1,2.
They are - uniform buffer, uniform buffer, sampler
vertex bit, vertex bit, fragment bit
for (auto a : newShader->features)
VkDescriptorSetLayoutBinding newBinding = {};
newBinding.descriptorType = (VkDescriptorType)layoutBindingDescriptorType(a.featureType);
newBinding.binding = a.binding;
newBinding.stageFlags = (VkShaderStageFlags)layoutBindingStageFlag(a.stage);
newBinding.descriptorCount = 1;
newBinding.pImmutableSamplers = nullptr;
VkDescriptorSetLayoutCreateInfo layoutCreateInfo = {};
layoutCreateInfo.bindingCount = static_cast<uint32_t>(layoutBindings.size());
layoutCreateInfo.pBindings =;
Descriptor Set Generation:
//Create a list of layouts
std::vector<VkDescriptorSetLayout> layouts(swapChainImages.size(), voa->shaderPipeline->shaderSetLayout);
//Allocate room for the descriptors
VkDescriptorSetAllocateInfo allocInfo = {};
allocInfo.descriptorPool = voa->shaderPipeline->descriptorPool;
allocInfo.descriptorSetCount = static_cast<uint32_t>(swapChainImages.size());
allocInfo.pSetLayouts =;
if (vkAllocateDescriptorSets(vdi->device, &allocInfo, voa-> != VK_SUCCESS) {
throw std::runtime_error("failed to allocate descriptor sets!");
//For each set of commandBuffers (frames in flight +1)
for (size_t i = 0; i < swapChainImages.size(); i++) {
std::vector<VkWriteDescriptorSet> descriptorWrites;
//Buffer Info construction
for (auto a : voa->renderComponent->getMaterial()->shader->features)
//Create a new descriptor write
uint32_t index = descriptorWrites.size();
descriptorWrites[index].dstBinding = a.binding;
VkDescriptorBufferInfo bufferInfo = {};
bufferInfo.buffer = viewProjectionBuffers[i];
bufferInfo.offset = 0;
bufferInfo.range = sizeof(ViewProjectionBuffer);
else if (a.bufferSource == HE2_SHADER_BUFFER_SOURCE_MODEL_BUFFER)
bufferInfo.buffer = modelBuffers[i];
bufferInfo.offset = voa->ID * sizeof(ModelBuffer);
bufferInfo.range = sizeof(ModelBuffer);
//The following is the same for all Uniform buffers
descriptorWrites[index].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
descriptorWrites[index].dstSet = voa->descriptorSets[i];
descriptorWrites[index].dstArrayElement = 0;
descriptorWrites[index].descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
descriptorWrites[index].descriptorCount = 1;
descriptorWrites[index].pBufferInfo = &bufferInfo;
else if (a.featureType == HE2_SHADER_FEATURE_TYPE_SAMPLER2D)
VulkanImageReference ref = VulkanTextures::images[a.imageHandle];
VkDescriptorImageInfo imageInfo = {};
imageInfo.imageView = ref.imageView;
imageInfo.sampler = defaultSampler;
descriptorWrites[index].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
descriptorWrites[index].dstSet = voa->descriptorSets[i];
descriptorWrites[index].dstArrayElement = 0;
descriptorWrites[index].descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
descriptorWrites[index].descriptorCount = 1;
descriptorWrites[index].pImageInfo = &imageInfo;
throw std::runtime_error("Unsupported feature type present in shader");
vkUpdateDescriptorSets(vdi->device, static_cast<uint32_t>(descriptorWrites.size()),, 0, nullptr);
Edit: Here is descriptor set binding code
vkCmdBeginRenderPass(commandBuffers[i], &renderPassInfo, VK_SUBPASS_CONTENTS_INLINE);
//Very temporary Render loop. Binds every frame, very clumsy
for (int j = 0; j < max; j++)
VulkanObjectAttachment* voa = objectAttachments[j];
VulkanModelAttachment* vma = voa->renderComponent->getModel()->getComponent<VulkanModelAttachment>();
if (vma->indices == 0) continue;
vkCmdBindPipeline(commandBuffers[i], VK_PIPELINE_BIND_POINT_GRAPHICS, voa->shaderPipeline->pipeline);
VkBuffer vertexBuffers[] = { vma->vertexBuffer };
VkDeviceSize offsets[] = { 0 };
vkCmdBindVertexBuffers(commandBuffers[i], 0, 1, vertexBuffers, offsets);
vkCmdBindIndexBuffer(commandBuffers[i], vma->indexBuffer, 0, VK_INDEX_TYPE_UINT32);
vkCmdBindDescriptorSets(commandBuffers[i], VK_PIPELINE_BIND_POINT_GRAPHICS, voa->shaderPipeline->pipelineLayout, 0, 1, &voa->descriptorSets[i], 0, nullptr);
vkCmdDrawIndexed(commandBuffers[i], static_cast<uint32_t>(vma->indices), 1, 0, 0, 0);
Buffer updating code:
ViewProjectionBuffer ubo = {};
ubo.view = HE2_Camera::main->getCameraMatrix();
ubo.proj = HE2_Camera::main->getProjectionMatrix();
ubo.proj[1][1] *= -1;
ubo.model = a->object->getModelMatrix();
void* data;
vmaMapMemory(allocator, a->mvpAllocations[i], &data);
memcpy(data, &ubo, sizeof(ubo));
vmaUnmapMemory(allocator, a->mvpAllocations[i]);
std::vector<ModelBuffer> modelBuffersData;
for (VulkanObjectAttachment* voa : objectAttachments)
ModelBuffer mb = {};
mb.model = voa->object->getModelMatrix();
void* data;
vmaMapMemory(allocator, modelBuffersAllocation[i], &data);
memcpy(data, &modelBuffersData, sizeof(ModelBuffer) * modelBuffersData.size());
vmaUnmapMemory(allocator, modelBuffersAllocation[i]);
I found the problem - not a Vulkan issue but a C++ syntax one sadly. I'll explain it anyway but likely to not be your issue if you're visiting this page in the future.
I generate my descriptor writes in a loop. They're stored in a vector and then updated at the end of the loop
std::vector<VkDescriptorWrite> descriptorWrites;
for(int i = 0; i < shader.features.size); i++)
//Various stuff to the descriptor write
vkUpdateDescriptorSets(vdi->device, static_cast<uint32_t>(descriptorWrites.size()),, 0, nullptr);
One parameter of the descriptor write is pImageInfo or pBufferInfo. These point to a struct that contains specific data for that buffer or image. I filled these in within the loop
{//Within the loop above
VkDescriptorBufferInfo bufferInfo = {};
bufferInfo.buffer = myBuffer;
descriptorWrites[i].pBufferInfo = &bufferInfo;
Because these are passed by reference, not value, the descriptorWrite when being updated refers to the data in the original struct. But because the original struct was made in a loop, and the vkUpdateDescriptors line is outside of the loop, by the time that struct is read it's out of scope and deleted.
While this should result in undefined behaviour, I can only imagine because there's no new variables between the end of the loop and the update call, the memory still read the contents of the last descriptorWrite in the loop. So all descriptors read that memory, and had the resources from the last descriptorWrite pushed to them. Fixed it all just by putting the VkDescriptorBufferInfos in a vector of their own at the start of the loop.
It looks to me like the offset you're setting here is causing the VkWriteDescriptorSet to read overflow memory:
else if (a.bufferSource == HE2_SHADER_BUFFER_SOURCE_MODEL_BUFFER)
bufferInfo.buffer = modelBuffers[i];
bufferInfo.offset = voa->ID * sizeof(ModelBuffer);
bufferInfo.range = sizeof(ModelBuffer);
If you were only updating part of a buffer every frame, you'd do something like this:
bufferInfo.buffer = mvpBuffer[i];
bufferInfo.offset = sizeof(mat4[]{viewMat, projMat});
bufferInfo.range = sizeof(modelMat);
If you place the model in another buffer, you probably want to create a different binding for your descriptor set and your bufferInfo for your model data would look like this:
bufferInfo.buffer = modelBuffer[i];
bufferInfo.offset = 0;
bufferInfo.range = sizeof(modelMat);

GL Screenshot Breaks on viewport resize…sometimes

I’m developing a plugin for SIMDIS (basically military google earth), written in c++ using VS 2012. It’s a pretty nifty little thing to auto plot points, and one of its functions is to take a series of screenshot of the view-port and save the images off so it can be used/processed somewhere else. This works fine too… until you re-size the view-port one too many times. Re-size is done by clicking the corner of the window and dragging it bigger and smaller, and the program may launch full screen or windowed mode; either way it works fine the first few sets… or as long as the window is not re-sized.
When it breaks, the program will still march happily along, create the files, and filling them with data at what seems to be an appropriate size for whatever resolution image I’m trying to generate… but the format becomes no-good. It will still be a *.bmp, but windows stops being able to understand it. No errors are thrown though, (I think, I’m not catching any GL errors?[if that’s possible?]).
I can’t get it to consistently happen with a specific number of actions, but it seems to start failing after 3-7 view-port re-sizes. I don’t know if this is a problem with my screenshot code, an issue with the SIMDIS program or plugin, a GL issue, or what. I’ve tested it on multiple machines.
Has anyone run into this problem before? Is there something specific I should be doing that I’m not? Is this a problem native to the parent program (SIMDIS), or something I can work with/around with GL commands I don’t know about?
Screenshot code follows:
#include "TakeScreenshot.h" //has "#include <gl/GL.h>" etc...
std::vector<int> * TakeScreenshot::TakeAScreenshotBMP(const char* filename)
//std::cout << "Screenshot! ";
std::vector<int> * returnVec = new std::vector<int>();
int VPort[4] = {0,0,0,0};
int FSize = 0;
int PackStore = 0;
//get GL viewport dimensions, x,y,w,h into vport
//make a framebuffer, RGB
FSize = VPort[2]*VPort[3]*3;
unsigned char PStore[8294400];// 4k sized buffer
//store settings
glGetIntegerv(GL_PACK_ALIGNMENT, &PackStore);
//unpack to byte order
glPixelStorei(GL_PACK_ALIGNMENT, 1);
//read the gl buffer into our buffer
//Pass back settings
glPixelStorei(GL_PACK_ALIGNMENT, PackStore);
//set up file info
BMIH.biSizeImage= VPort[2] * VPort[3] * 3;
BMIH.biWidth = VPort[2];
BMIH.biHeight = VPort[3];
BMIH.biPlanes = 1;
BMIH.biBitCount = 24;
BMIH.biCompression = BI_RGB;
BITMAPFILEHEADER bmfh;//file header
int nBitsOffset = sizeof(BITMAPFILEHEADER) + BMIH.biSize;
LONG lImageSize = BMIH.biSizeImage;
LONG lFileSize = nBitsOffset + lImageSize;
bmfh.bfType = 'B' + ('M'<<8);
bmfh.bfOffBits = nBitsOffset;
bmfh.bfSize = lFileSize;
bmfh.bfReserved1 = bmfh.bfReserved2 = 0;
// swap r and b values because GL has them backwards for BMP format.
unsigned char SwapByte;
for(int loop = 0; loop<FSize; loop+=3)
SwapByte = PStore[loop];
PStore[loop] = PStore[loop+2];
PStore[loop +2] = SwapByte;
// File writing section
FILE *pFile;
pFile = fopen(filename, "wb");
//if something borked
if(pFile == NULL)
std::cout << "TakeScreenshot::TakeAScreenshotBMP>> Error; was not able to create file (Permisions?)" << std::endl;
return returnVec; //exit
UINT nWrittenFileHeaderSize = fwrite(&bmfh,1,sizeof(BITMAPFILEHEADER), pFile);
UINT nWrittenInfoHeaderSize = fwrite(&BMIH,1,sizeof(BITMAPINFOHEADER), pFile);
UINT nWrittenDIBDataSize = fwrite(&PStore, 1, lImageSize, pFile);
//some return data for processing later
return returnVec;

DirectX 11 - Compute Shader, copy data from the GPU to the CPU

I've just started up using Direct compute in an attempt to move a fluid simulation I have been working on, onto the GPU. I have found a very similar (if not identical) question here however seems the resolution to my problem is not the same as theirs; I do have my CopyResource the right way round for sure! As with the pasted question, I only get a buffer filled with 0's when copy back from the GPU. I really can't see the error as I don't understand how I can be reaching out of bounds limits. I'm going to apologise for the mass amount of code pasting about to occur but I want be sure I've not got any of the setup wrong.
Output Buffer, UAV and System Buffer set up
outputDesc.Usage = D3D11_USAGE_DEFAULT;
outputDesc.BindFlags = D3D11_BIND_UNORDERED_ACCESS;
outputDesc.ByteWidth = sizeof(BoundaryConditions) * numElements;
outputDesc.CPUAccessFlags = 0;
outputDesc.StructureByteStride = sizeof(BoundaryConditions);
result =_device->CreateBuffer(&outputDesc, 0, &m_outputBuffer);
outputDesc.Usage = D3D11_USAGE_STAGING;
outputDesc.BindFlags = 0;
outputDesc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
result = _device->CreateBuffer(&outputDesc, 0, &m_outputresult);
uavDesc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
uavDesc.Buffer.FirstElement = 0;
uavDesc.Buffer.Flags = 0;
uavDesc.Buffer.NumElements = numElements;
result =_device->CreateUnorderedAccessView(m_outputBuffer, &uavDesc, &m_BoundaryConditionsUAV);
Running the Shader in my frame loop
HRESULT result;
D3D11_MAPPED_SUBRESOURCE mappedResource;
_deviceContext->CSSetShader(m_BoundaryConditionsCS, nullptr, 0);
_deviceContext->CSSetUnorderedAccessViews(0, 1, &m_BoundaryConditionsUAV, 0);
_deviceContext->Dispatch(1, 1, 1);
// Unbind output from compute shader
ID3D11UnorderedAccessView* nullUAV[] = { NULL };
_deviceContext->CSSetUnorderedAccessViews(0, 1, nullUAV, 0);
// Disable Compute Shader
_deviceContext->CSSetShader(nullptr, nullptr, 0);
_deviceContext->CopyResource(m_outputresult, m_outputBuffer);
result = _deviceContext->Map(m_outputresult, 0, D3D11_MAP_READ, 0, &mappedData);
BoundaryConditions* newbc = reinterpret_cast<BoundaryConditions*>(mappedData.pData);
for (int i = 0; i < 4; i++)
_deviceContext->Unmap(m_outputresult, 0);
struct BoundaryConditions
float3 x;
float3 y;
RWStructuredBuffer<BoundaryConditions> _boundaryConditions;
[numthreads(4, 1, 1)]
void ComputeBoundaryConditions(int3 id : SV_DispatchThreadID)
_boundaryConditions[id.x].x = float3(id.x,id.y,id.z);
I dispatch the Compute shader after I begin a frame and before I end the frame. I have played around with moving the shaders dispatch call outside of the end scene and before the present ect but nothing seems to effect the process. Can't seem to figure this one out!
Holy Smokes I fixed the error! I was creating the compute shader to a different ID3D11ComputeShader pointer! D: Works like a charm! Pheew Sorry and thanks Adam!

Cuda Create 3d texture and cudaArray(3d) from device memory

im trying to create a texture 3d from a part of a device array.
To do this, these are my steps:
malloc Device Array
Write Device Array
Create CudaArray (3D)
Bind Texture to CudaArray
The way im doing it it creates no compiler errors, but when i run cuda-memcheck it's failing when im trying to fetch data from the Texture.
Invalid global read of size 8 .. Address 0x10dfaf3a0 is out of bounds
Thats why i'm guessing i declared the texture Array wrong.
here is how i access the texture:
The way im doing the steps mentioned above:
1.Malloc Device Array
cudaMalloc((void **)&d_Noise, sqrSizeNoise*nNoise*sizeof(float));
2.Write Device Array
curandGenerateUniform(gen, d_Noise, sqrSizeNoise*nNoise);
3+4.Creating the Cuda Array and binding it to the texture (Im guessing the mistake is here)
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();//cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);
cudaArray *d_cuArr;
cudaMalloc3DArray(&d_cuArr, &channelDesc, make_cudaExtent(SizeNoise,SizeNoise,SizeNoise), 0);
cudaMemcpy3DParms copyParams = {0};
//Loop for every separated Noise Texture (nNoise = 4)
for(int i = 0; i < nNoise; i++){
//initialize the textures
NoiseTextures[i] = texture<float, 3, cudaReadModeElementType>(1,cudaFilterModeLinear,cudaAddressModeWrap,channelDesc);
//Array creation
//+(sqrSizeNoise*i) is to separate the created Noise Array into smaller parts with the size of SizeNoise^3
copyParams.srcPtr = make_cudaPitchedPtr(d_Noise+(sqrSizeNoise*i), SizeNoise*sizeof(float), SizeNoise, SizeNoise);
copyParams.dstArray = d_cuArr;
copyParams.extent = make_cudaExtent(SizeNoise,SizeNoise,SizeNoise);
copyParams.kind = cudaMemcpyDeviceToDevice;
//Array creation End
//new Bind
// set texture parameters
NoiseTextures[i].normalized = true; // access with normalized texture coordinates
NoiseTextures[i].filterMode = cudaFilterModeLinear; // linear interpolation
NoiseTextures[i].addressMode[0] = cudaAddressModeWrap; // wrap texture coordinates
NoiseTextures[i].addressMode[1] = cudaAddressModeWrap;
NoiseTextures[i].addressMode[2] = cudaAddressModeWrap;
// bind array to 3D texture
checkCudaErrors(cudaBindTextureToArray(NoiseTextures[i], d_cuArr, channelDesc));
//end Bind
I've Pasted this code snippet to Pastebin so its easier to look at with colors etc.
I hope I clearly described my problem. If not pls comment!
Can you help me with this?
Thanks for reading,
Here is a complete code so you can try it on your own machine:
#include <helper_cuda.h>
#include <helper_functions.h>
#include <helper_cuda_gl.h>
#include <texture_types.h>
#include <cuda_runtime.h>
#include <curand.h>
static texture<float, 3, cudaReadModeElementType> NoiseTextures[4];//texture Array
float *d_NoiseTest;//Device Array with random floats
int SizeNoiseTest = 32;
int sqrSizeNoiseTest = 32768;
void CreateTexture();
__global__ void AccesTexture(texture<float, 3, cudaReadModeElementType>* NoiseTextures)
int test = tex3D(NoiseTextures[0],threadIdx.x,threadIdx.y,threadIdx.z);//by using this the error occurs
main(int argc, char **argv)
void CreateTexture()
//curand Random Generator (needs compiler link -lcurand)
curandGenerator_t gen;
cudaMalloc((void **)&d_NoiseTest, sqrSizeNoiseTest*4*sizeof(float));//Allocation of device Array
curandGenerateUniform(gen, d_NoiseTest, sqrSizeNoiseTest*4);//writing data to d_NoiseTest
//cudaArray Descriptor
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();
//cuda Array
cudaArray *d_cuArr;
cudaMalloc3DArray(&d_cuArr, &channelDesc, make_cudaExtent(SizeNoiseTest*sizeof(float),SizeNoiseTest,SizeNoiseTest), 0);
cudaMemcpy3DParms copyParams = {0};
//Loop for every separated Noise Texture (4 = 4)
for(int i = 0; i < 4; i++){
//initialize the textures
NoiseTextures[i] = texture<float, 3, cudaReadModeElementType>(1,cudaFilterModeLinear,cudaAddressModeWrap,channelDesc);
//Array creation
//+(sqrSizeNoise*i) is to separate the created Noise Array into smaller parts with the size of SizeNoise^3
copyParams.srcPtr = make_cudaPitchedPtr(d_NoiseTest+(sqrSizeNoiseTest*i), SizeNoiseTest*sizeof(float), SizeNoiseTest, SizeNoiseTest);
copyParams.dstArray = d_cuArr;
copyParams.extent = make_cudaExtent(SizeNoiseTest*sizeof(float),SizeNoiseTest,SizeNoiseTest);
copyParams.kind = cudaMemcpyDeviceToDevice;
//Array creation End
//new Bind
// set texture parameters
NoiseTextures[i].normalized = true; // access with normalized texture coordinates
NoiseTextures[i].filterMode = cudaFilterModeLinear; // linear interpolation
NoiseTextures[i].addressMode[0] = cudaAddressModeWrap; // wrap texture coordinates
NoiseTextures[i].addressMode[1] = cudaAddressModeWrap;
NoiseTextures[i].addressMode[2] = cudaAddressModeWrap;
// bind array to 3D texture
checkCudaErrors(cudaBindTextureToArray(NoiseTextures[i], d_cuArr, channelDesc));
//end Bind
You need to link -lcurand though. And include CUDA-6.0/samples/common/inc
Im now getting a different error in this code
code=11(cudaErrorInvalidValue) "cudaMemcpy3D(&copyParams)"
Even though it's the exact same code then my original. - Im starting to get completely confused.
Thank you for your help
Here's a worked example showing the creation of an array of texture objects, roughly following the path of the code you provided. You can see, by comparing to the texture reference code I placed here, that the first set of texture reads from the first texture object (i.e. the first kernel call) are the same numerical values as the set of reads from the texture reference example (you may need to adjust the grid size of the two example codes to match).
Texture object usage requires compute capability 3.0 or higher.
$ cat
#include <helper_cuda.h>
#include <curand.h>
#define NUM_TEX 4
const int SizeNoiseTest = 32;
const int cubeSizeNoiseTest = SizeNoiseTest*SizeNoiseTest*SizeNoiseTest;
static cudaTextureObject_t texNoise[NUM_TEX];
__global__ void AccesTexture(cudaTextureObject_t my_tex)
float test = tex3D<float>(my_tex,(float)threadIdx.x,(float)threadIdx.y,(float)threadIdx.z);//by using this the error occurs
printf("thread: %d,%d,%d, value: %f\n", threadIdx.x, threadIdx.y, threadIdx.z, test);
void CreateTexture()
float *d_NoiseTest;//Device Array with random floats
cudaMalloc((void **)&d_NoiseTest, cubeSizeNoiseTest*sizeof(float));//Allocation of device Array
for (int i = 0; i < NUM_TEX; i++){
//curand Random Generator (needs compiler link -lcurand)
curandGenerator_t gen;
curandGenerateUniform(gen, d_NoiseTest, cubeSizeNoiseTest);//writing data to d_NoiseTest
//cudaArray Descriptor
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();
//cuda Array
cudaArray *d_cuArr;
checkCudaErrors(cudaMalloc3DArray(&d_cuArr, &channelDesc, make_cudaExtent(SizeNoiseTest*sizeof(float),SizeNoiseTest,SizeNoiseTest), 0));
cudaMemcpy3DParms copyParams = {0};
//Array creation
copyParams.srcPtr = make_cudaPitchedPtr(d_NoiseTest, SizeNoiseTest*sizeof(float), SizeNoiseTest, SizeNoiseTest);
copyParams.dstArray = d_cuArr;
copyParams.extent = make_cudaExtent(SizeNoiseTest,SizeNoiseTest,SizeNoiseTest);
copyParams.kind = cudaMemcpyDeviceToDevice;
//Array creation End
cudaResourceDesc texRes;
memset(&texRes, 0, sizeof(cudaResourceDesc));
texRes.resType = cudaResourceTypeArray;
texRes.res.array.array = d_cuArr;
cudaTextureDesc texDescr;
memset(&texDescr, 0, sizeof(cudaTextureDesc));
texDescr.normalizedCoords = false;
texDescr.filterMode = cudaFilterModeLinear;
texDescr.addressMode[0] = cudaAddressModeClamp; // clamp
texDescr.addressMode[1] = cudaAddressModeClamp;
texDescr.addressMode[2] = cudaAddressModeClamp;
texDescr.readMode = cudaReadModeElementType;
checkCudaErrors(cudaCreateTextureObject(&texNoise[i], &texRes, &texDescr, NULL));}
int main(int argc, char **argv)
return 0;
compile with:
$ nvcc -arch=sm_30 -I/shared/apps/cuda/CUDA-v6.0.37/samples/common/inc -lcurand -o t507
$ cuda-memcheck ./t507
thread: 0,0,0, value: 0.310691
thread: 1,0,0, value: 0.627906
thread: 0,1,0, value: 0.638900
thread: 1,1,0, value: 0.665186
thread: 0,0,1, value: 0.167465
thread: 1,0,1, value: 0.565227
thread: 0,1,1, value: 0.397606
thread: 1,1,1, value: 0.503013
thread: 0,0,0, value: 0.809163
thread: 1,0,0, value: 0.795669
thread: 0,1,0, value: 0.808565
thread: 1,1,0, value: 0.847564
thread: 0,0,1, value: 0.853998
thread: 1,0,1, value: 0.688446
thread: 0,1,1, value: 0.733255
thread: 1,1,1, value: 0.649379
thread: 0,0,0, value: 0.040824
thread: 1,0,0, value: 0.087417
thread: 0,1,0, value: 0.301392
thread: 1,1,0, value: 0.298669
thread: 0,0,1, value: 0.161962
thread: 1,0,1, value: 0.316443
thread: 0,1,1, value: 0.452077
thread: 1,1,1, value: 0.477722
========= ERROR SUMMARY: 0 errors
In this case I'm using the same kernel, called multiple times, to read from the individual texture objects. It should be possible to pass multiple objects to the same kernel, however it is not advisable to have a single warp read from multiple textures, if that can be avoided in your code. The actual issue resides at the quad level, which I'd prefer not to get into. It's best if you can arrange your code so that a warp is reading from the same texture object, on any given cycle.
Note that for simplicity of presentation, this CreateTexture() function overwrites previously allocated device pointers such as d_cuArr, during the processing of the loop. This isn't illegal or a functional issue, but it raises the possibility of memory leaks.
I assume you can modify the code to handle deallocation of those if this is a concern. The purpose of this code is to demonstrate the method to get things working.
In cudaMalloc3DArray, it should be something like this make_cudaExtent(SizeNoiseTest,SizeNoiseTest,SizeNoiseTest) not