Write to DX11 backbuffer from C++AMP - c++

According to this ms blog post
http://blogs.msdn.com/b/nativeconcurrency/archive/2012/07/02/interop-with-direct3d-textures-in-c-amp.aspx
You can write directly to the backbuffer from C++AMP.
Using Interop, you can get the texture object of the back buffer associated with the window using the IDXGISwapChain and update it directly in the C++ AMP kernel.
I created an amp device descriptor from the dx device and I got a pointer to the backbuffer and then tried to make an amp texture from it, but I found that the texture descriptor bindFlags, of the backbuffer, were only D3D11_BIND_RENDER_TARGET and I needed at least D3D11_BIND_UNORDERED_ACCESS or D3D11_BIND_SHADER_RESOURCE in order for Concurrency::graphics::direct3d::make_texture to function.
I can easily enough make any other d3d texture and connect that to amp, if I set the bindflags, but for the flags set on the backbuffer, I cannot connect them.
Then I find this post
http://social.msdn.microsoft.com/Forums/vstudio/en-US/15aa1186-210b-4ba7-89b0-b74f742d6830/c-amp-and-direct2d
which has the following marked an an answer by a Microsoft community contributor
I was trying to write to back buffer of the swap chain directly. As far as I understood, this can't be done, because usage flags that can be used when creating a back buffer texture are incompatible with ones that are needed by C++ AMP to manipulate the texture.
So, on one hand, it (writing to backbuffer from c++AMP) is used an an example of interop and on the other hand it is explained to not be possible...?
My current requirement is just to generate a raytraced image in C++AMP and show that on a d3d display without copying data back from the graphics card every frame. I realize that I could just generate my own texture and then render a quad with that, but it would be simpler writing directly to the backbuffer, and if it can be done, that is what I would like to do.
Perhaps someone here can explain if it can be done and what steps are required to accomplish this, or alternatively explain that no, this truly cannot be done.
Thanks in advance for any help on this topic.
[EDIT]
I now found this info
https://software.intel.com/en-us/articles/microsoft-directcompute-on-intel-ivy-bridge-processor-graphics
// this qualifies the back buffer for being the target of compute shader writes
sd.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT | DXGI_USAGE_UNORDERED_ACCESS | DXGI_USAGE_SHADER_INPUT;
I actually did try that previously, but the call to CreateSwapChainForCoreWindow fails with
First-chance exception at 0x75251D4D in TestDxAmp.exe: Microsoft C++ exception: Platform::InvalidArgumentException ^ at memory location 0x0328E484. HRESULT:0x80070057 The parameter is incorrect.
Which is not being very informative.

I think the original forum post is maybe misleading. For both texture and buffer interop the unordered access binding is required for AMP interop. AMP is built on top of DX/DirectCompute so this applies in both cases as noted in the Intel link.
your program can create an arrayassociated with an existing Direct3D
buffer using the make_array()function.
template<typename T, int N>
array<T,N> make_array(const extent& ext, IUnknown* buffer);
The
Direct3D buffer must implement the ID3D11Bufferinterface. It must
support raw views (D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS) and
allow SHADER_RESOURCE and UNORDERED_ACCESS binding. The buffer itself
must be of the correct size, the size of the extent multiplied by the
size of the buffer type. The following code uses make_arrayto create
an array using the accelerator_view, dxView, which was created in the
previous section: HRESULT hr = S_OK;
-- C++ AMP Book
I'm not a DX expert but from the following post it looks like you can configure the swap chain to support UAVs.
Sobel Filter Compute Shader

Related

How do we display pixel data calculated in an OpenCL kernel to the screen using OpenGL?

I am interested in writing a real-time ray tracing application in c++ and I heard that using OpenCL-OpenGL interoperability is a good way to do this (to make good use of the GPU), so I have started writing a c++ project using this interoperability and using GLFW for window management. I should mention that although I have some coding experience, I do not have so much in c++ and have not worked with OpenCL or OpenGL before attempting this project, so I would appreciate it if answers are given with this in mind (that is, beginner-friendly terminology is preferred).
So far I have been able to get OpenCL-OpenGL interoperability working with an example using a vertex buffer object. I have also demonstrated that I can create image data with an RGBA array (at least on the CPU), send this to an OpenGL texture with glTexImage2D() and display it using glBlitFramebuffer().
My problem is that I don't know how to create an OpenCL kernel that is able to calculate pixel data such that it can be given as the data parameter in glTexImage2D(). I understand that to use the interoperability, we must first create OpenGL objects and then create OpenCL objects from these to write the data on as these objects share memory, so I am assuming I must first create an empty OpenGL array object then create an OpenCL array object from this to apply an appropriate kernel to which would write the pixel data before using the OpenGL array object as the data parameter in glTexImage2D(), but I am not sure what kind of object to use and have not seen any examples demonstrating this. A simple example showing how OpenCL can create pixel data for an OpenGL texture image (assuming a valid OpenCL-OpenGL context) would be much appreciated. Please do not leave any line out as I might not be able to fill in the blanks!
It's also very possible that the method I described above for implementing a ray tracer is not possible or at least not recommended, so if this is the case please outline an advised alternate method for sending OpenCL kernel calculated pixel data to OpenGL and subsequently drawing this to the screen. The answer to this similar question does not go into enough detail for me and the CL/GL interop link is not working. The answer mentions that this can be achieved using a renderbuffer rather than a texture, but it says at the bottom of the Khronos OpenGL wiki for Renderbuffer Objects that the only way to send pixel data to them is via pixel transfer operations but I can not find any straightforward explanation for how to initialize data this way.
Note that I am using OpenCL c (no c++ bindings).
From your second para you are creating an OpenCL context with a platform specific combination of GLX_DISPLAY / WGL_HDC and GL_CONTEXT properties to interoperate with OpenGL, and you can create a vertex buffer object that can be read/written as necessary by both OpenGL and OpenCL.
That's most of the work. In OpenGL you can copy any VBO into a texture with
glBindBuffer(GL_PIXEL_UNPACK_BUFER, myVBO);
glTexSubImage2D(GL_TEXTURE_2D, level, x, y, width, height, format, size, NULL);
with the NULL at the end meaning to copy from GPU memory (the unpack buffer) rather than CPU memory.
As with copying from regular CPU memory, you might also need to change the pixel alignment if it isn't 32 bit.

Using Media Foundation to encode Direct X surfaces

I'm trying to use the MediaFoundation API to encode a video but I'm having problems pushing the samples to the SinkWriter.
I'm getting the frames to encode through the Desktop Duplication API. What I end up with is an ID3D11Texture2D with the desktop image in it.
I'm trying to create an IMFVideoSample containing this surface and then push that video sample to a SinkWriter.
I've tried going about this in different ways:
I called MFCreateVideoSampleFromSurface(texture, &pSample) where texture is the ID3D11Texture2D, filled in the SampleTime and SampleDuration and then passed the created sample to the SinkWriter.
SinkWriter returned E_INVALIDARG.
I tried creating the sample by passing nullptr as the first argument and creating the buffer myself using MFCreateDXGISurfaceBuffer, and then passing the resulting buffer into the Sample.
That didn't work either.
I read through the MediaFoundation documentation and couldn't find detailed information on how to create the sample out of a DirectX texture.
I ran out of things to try.
Has anyone out there used this API before and can think of things I should check, or of any way on how I can go about debugging this?
First of all you should learn to use mftrace tool.
Very likely, it will tell you the problem right away.
But my guess is, following problems are likely.
Probably, some other attributes are required besides SampleTime / SampleDuration.
Probably, SinkWriter needs a texture it can read on CPU. To fix that, when a frame is available, create a staging texture of the same format + size, call CopyResource to copy desktop to staging texture, then pass that staging texture to MF.
Even if you use a hardware encoder so the CPU never tries to read the texture data, I don’t think it’s a good idea to directly pass your desktop texture to MF.
When you set a D3D texture for sample, no data is copied anywhere, the sample merely retains the texture.
MF works asynchronously, it may buffer several samples in its topology nodes if they want to.
DD gives you data synchronously, you may only access the texture between AcquireNextFrame and ReleaseFrame calls.

How to render to OpenGL from Vulkan?

Is it possible to render to OpenGL from Vulkan?
It seems nVidia has something:
https://lunarg.com/faqs/mix-opengl-vulkan-rendering/
Can it be done for other GPU's?
Yes, it's possible if the Vulkan implementation and the OpenGL implementation both have the appropriate extensions available.
Here is a screenshot from an example app in the Vulkan Samples repository which uses OpenGL to render a simple shadertoy to a texture, and then uses that texture in a Vulkan rendered window.
Although your question seems to suggest you want to do the reverse (render to something using Vulkan and then display the results using OpenGL), the same concepts apply.... populate a texture in one API, use synchronization to ensure the GPU work is complete, and then use the texture in the other API. You can also do the same thing with buffers, so for instance you could use Vulkan for compute operations and then use the results in an OpenGL render.
Requirements
Doing this requires that both the OpenGL and Vulkan implementations support the required extensions, however, according to this site, these extensions are widely supported across OS versions and GPU vendors, as long as you're working with a recent (> 1.0.51) version of Vulkan.
You need the the External Objects extension for OpenGL and the External Memory/Fence/Sempahore extensions for Vulkan.
The Vulkan side of the extensions allow you to allocate memory, create semaphores or fences while marking the resulting objects as exportable. The corresponding GL extensions allow you to take the objects and manipulate them with new GL commands which allow you to wait on fences, signal and wait on semaphores, or use Vulkan allocated memory to back an OpenGL texture. By using such a texture in an OpenGL framebuffer, you can pretty much render whatever you want to it, and then use the rendered results in Vulkan.
Export / Import example code
For example, on the Vulkan side, when you're allocating memory for an image you can do this...
vk::Image image;
... // create the image as normal
vk::MemoryRequirements memReqs = device.getImageMemoryRequirements(image);
vk::MemoryAllocateInfo memAllocInfo;
vk::ExportMemoryAllocateInfo exportAllocInfo{
vk::ExternalMemoryHandleTypeFlagBits::eOpaqueWin32
};
memAllocInfo.pNext = &exportAllocInfo;
memAllocInfo.allocationSize = memReqs.size;
memAllocInfo.memoryTypeIndex = context.getMemoryType(
memReqs.memoryTypeBits, vk::MemoryPropertyFlagBits::eDeviceLocal);
vk::DeviceMemory memory;
memory = device.allocateMemory(memAllocInfo);
device.bindImageMemory(image, memory, 0);
HANDLE sharedMemoryHandle = device.getMemoryWin32HandleKHR({
texture.memory, vk::ExternalMemoryHandleTypeFlagBits::eOpaqueWin32
});
This is using the C++ interface and is using the Win32 variation of the extensions. For Posix platforms there are alternative methods for getting file descriptors instead of WIN32 handles.
The sharedMemoryHandle is the value that you'll need to pass to OpenGL, along with the actual allocation size. On the GL side you can then do this...
// These values should be populated by the vulkan code
HANDLE sharedMemoryHandle;
GLuint64 sharedMemorySize;
// Create a 'memory object' in OpenGL, and associate it with the memory
// allocated in vulkan
GLuint mem;
glCreateMemoryObjectsEXT(1, mem);
glImportMemoryWin32HandleEXT(mem, sharedMemorySize,
GL_HANDLE_TYPE_OPAQUE_WIN32_EXT, sharedMemoryHandle);
// Having created the memory object we can now create a texture and use
// the memory object for backing it
glCreateTextures(GL_TEXTURE_2D, 1, &color);
// The internalFormat here should correspond to the format of
// the Vulkan image. Similarly, the w & h values should correspond to
// the extent of the Vulkan image
glTextureStorageMem2DEXT(color, 1, GL_RGBA8, w, h, mem, 0 );
Synchronization
The trickiest bit here is synchronization. The Vulkan specification requires images to be in certain states (layouts) before corresponding operations can be performed on them. So in order to do this properly (based on my understanding), you would need to...
In Vulkan, create a command buffer that transitions the image to ColorAttachmentOptimal layout
Submit the command buffer so that it signals a semaphore that has similarly been exported to OpenGL
In OpenGL, use the glWaitSemaphoreEXT function to cause the GL driver to wait for the transition to complete.
Note that this is a GPU side wait, so the function will not block at all. It's similar to glWaitSync (as opposed to glClientWaitSync)in this regard.
Execute your GL commands that render to the framebuffer
Signal a different exported Semaphore on the GL side with the glSignalSemaphoreEXT function
In Vulkan, execute another image layout transition from ColorAttachmentOptimal to ShaderReadOnlyOptimal
Submit the transition command buffer with the wait semaphore set to the one you just signaled from the GL side.
That's would be an optimal path. Alternatively, the quick and dirty method would be to do the vulkan transition, and then execute queue and device waitIdle commands to ensure the work is done, execute the GL commands, followed by glFlush & glFinish commands to ensure the GPU is done with that work, and then resume your Vulkan commands. This is more of a brute force approach and will likely produce poorer performance than doing the proper synchronization.
NVIDIA has created an OpenGL extension, NV_draw_vulkan_image, which can render a VkImage in OpenGL. It even has some mechanisms for interacting with Vulkan semaphores and the like.
However, according to the documentation, you must bypass all Vulkan layers, since layers can modify non-dispatchable handles and the OpenGL extension doesn't know about said modifications. And their recommended means of doing so is to use the glGetVkProcAddrNV for all of your Vulkan functions.
Which also means that you can't get access to any debugging that relies on Vulkan layers.
There is some more information in this more recent slide deck from SIGGRAPH 2016. Slides 63-65 describe how to blit a Vulkan image to an OpenGL backbuffer. My opinion is that it may have been pretty easy for NVIDIA to support this since the Vulkan driver is contained in libGL.so (on Linux). So it may not have been that hard to give the Vulkan image handle to the GL side of the driver and have it be useful.
As another answer pointed out, there are still no official registered multi-vendor interop extensions. This approach just works on NVIDIA.

VTK OpenGL objects (3D texture) access from CUDA‏

Is there any proper way to access the low level OpenGL objects of VTK in order to modify them from a CUDA/OpenCL kernel using the openGL-CUDA/OpenCL interoperability feature?
Specifically, I would want to get the GLuint (or unsigned int) member from vtkOpenGLGPUVolumeRayCastMapper that points to the Opengl 3D Texture object where the dataset is stored, in order to bind it to a CUDA Surface to be able to access and modify its values from a CUDA kernel implemented by me.
For further information, the process that I need to follow is explained here:
http://rauwendaal.net/2011/12/02/writing-to-3d-opengl-textures-in-cuda-4-1-with-3d-surface-writes/
where the texID object used there (in Steps 1 and 2) is the equivalent to what I want to retrieve from VTK.
At a first look at the vtkOpenGLGPUVolumeRayCastMapper functions, I don't find an easy way to do this, rather than maybe creating a vtkGPUVolumeRayCastMapper subclass, but even in that case I am not sure what should I modify exactly, since I guess that some other members depend on the 3D Texture values, and should be also updated after modifying it.
So, do you know some way to do this?
Lots of thanks.
Subclassing might work, but you could probably avoid it if you wanted. The important thing is that you get the order of the GL/CUDA API calls in the right order.
First, you have to register the texture with CUDA. This is done using:
cudaGraphicsGLRegisterImage(&cuda_graphics_resource, texture_handle,
GL_TEXTURE_3D, cudaGraphicsRegisterFlagsSurfaceLoadStore);
with the stipulation that texture_handle is a GLuint written to by a call to glGenTextures(...)
Once you have registered the texture with CUDA, you can create the surface which can be read or written to in your kernel.
The only thing you have to worry about from here is that vtk does not use the texture in between a call to cudaGraphicsMapResources(...) and cudaGraphicsUnmapResources(...). Everything else should just be standard CUDA.
Also once you map the texture to CUDA and write to it within a kernel, there is no additional work besides unmapping the texture. GL will get the modified texture the next time it is used.

What is the use case of cudaGraphicsRegisterFlagsWriteDiscard in cudaGraphicsGLRegisterImage?

I'm fairly new to CUDA, but I've managed to display something generated by a kernel on the screen using OpenGL. I've tried several approach :
Using a PBO and an OpenGL texture (old style);
Using a OpenGL texture as a CUDA surface and rendering on a quad (new style);
Using a renderbuffer as a CUDA surface and rendering using glBlitFramebuffer.
All of them worked, but, while implementing #2, I erroneously set the hint as cudaGraphicsRegisterFlagsWriteDiscard. Since all of the data will be generated by CUDA, I thought this was the correct option. However, later I realized that I needed a CUDA surface to write to an OpenGL texture, and when you use a surface, you are requested to use the LoadStore flag.
So basically my question is this : Since I absolutely need a CUDA surface to write to an OpenGL texture in CUDA, what is the use case of cudaGraphicsRegisterFlagsWriteDiscard in cudaGraphicsGLRegisterImage?
The documentation description seems pretty straightforward. It is for one-way delivery of data from CUDA to OpenGL.
This online book excerpt provides a similar explanation:
Applications where CUDA is the producer and OpenGL is the consumer should register the objects with a write-discard flag...
If you want to see an example, take a look at the postProcessGL cuda sample. In that case, OpenGL is rendering an image, and it's being post-processed (blur added) by cuda, before display. In this case, there are two separate pathways for data flow. In the OpenGL->CUDA case, the data is handled by the createTextureSrc function, and the flag specified is read-only. For the CUDA->OpenGL case (delivery of the post-processed frame) the function is handled in createTextureDst, where a call is made to cudaGraphicsGLRegisterImage with the cudaGraphicsMapFlagsWriteDiscard flag specified, since on this path, CUDA is producing and OpenGL is consuming.
To understand how the textures are handled (populated with data from the cuda operations via a cudaArray) you probably want to study the sequence of operations in processImage().