nVidia GPU returns-1000 for clEnqueueWriteImage - opengl

Tying to figure out what the issue (and error code) is for this call. First to preface this works just fine on AMD, it only has issues on nVidia.
unsigned char *buffer;
...
cl_int status;
cl::size_t<3> origin;
cl::size_t<3> region;
origin[0]=0;
origin[1]=0;
origin[2]=0;
region[0]=m_width;
region[1]=m_height;
region[2]=1;
status=clEnqueueWriteImage(m_commandQueue, m_image, CL_FALSE, origin, region, 0, 0, buffer, 0, NULL, NULL);
status returns -1000, which is not a standard openCl error code. All other functions related to the opening of the device, context, and command queue all succeed. The context is interop'ed with openGl and again this is all completely functional on AMD.

For future reference, seems the error happens if the image is interop'ed with an OpenGL texture and the call is made before the image is acquired using clEnqueueAcquireGLObjects. I had used the acquire later when images were used but not right before the image was set. Amd's driver does not appear to care about this little detail.

Related

Encoding Desktop Duplication API ID3D11Texture2D frame using NvPipe

I'm trying to capture the desktop frames using Desktop Duplication API, and encode them right away with NvPipe without going through CPU access to pixels.
Is there any way to have the ID3D11Texture2D data as an input for NvPipe, or some other efficient way of doing it? I'm working on a VR solution that requires as low latency as possible, so even a 1ms saved is a big deal.
Edit: After following recommendations from #Soonts, I've ended up with this code which doesn't seem to work:
cudaArray *array;
m_DeviceContext->CopySubresourceRegion(CopyBuffer, 0, 0, 0, 0, m_SharedSurf, 0, Box);
cudaError_t err = cudaGraphicsD3D11RegisterResource(&_cudaResource, CopyBuffer, cudaGraphicsRegisterFlagsNone);
err = cudaGraphicsResourceSetMapFlags(_cudaResource, cudaGraphicsMapFlagsReadOnly);
cudaStream_t cuda_stream;
cudaStreamCreate(&cuda_stream);
err = cudaGraphicsMapResources(1, &_cudaResource, cuda_stream);
err = cudaGraphicsSubResourceGetMappedArray(&array, _cudaResource, 0, 0);
uint64_t compressedSize = NvPipe_Encode(encoder, array, dataPitch, buffer.data(), buffer.size(), width, height, false);
The NvPipe_Encode results in a memory access violation and does nothing. I don't know which step I'm messing up, as I can't seem to find any documentation about any of the functions or variables/structures online, and putting watches on variables shows no valuable information other than their address in memory.
I have not tried but I think should be doable with CUDA interop.
Call cudaGraphicsD3D11RegisterResource to register a texture for CUDA interop. Not sure you can register the DWM-owned texture you get from DD. If you can't, make another one (in a default pool), register that one, and update each frame with ID3D11DeviceContext::CopyResource.
Call cudaGraphicsResourceSetMapFlags to specify you want read-only access from CUDA side of the interop.
Call cudaGraphicsMapResources to allow CUDA to access the texture.
Call cudaGraphicsSubResourceGetMappedArray. Now you have a device pointer with the frame data you can give to NvPipe.
P.S. Other option is using media foundation instead of NvPipe. Will work on all GPUs not just nVidia, on most systems MF also uses hardware encoders. I’m not sure about latency, never used MF for anything too sensitive to it, and never used NvPipe at all, have no idea how they compare.

Pixel-path performance warning: Pixel transfer is synchronized with 3D rendering

I am uploading image data into GL texture asynchronously.
In debug output I am getting these warnings during the rendering:
Source:OpenGL,type: Other, id: 131185, severity: Notification
Message: Buffer detailed info: Buffer object 1 (bound to
GL_PIXEL_UNPACK_BUFFER_ARB, usage hint is GL_DYNAMIC_DRAW) has been
mapped WRITE_ONLY in SYSTEM HEAP memory (fast). Source:OpenGL,type:
Performance, id: 131154, severity: Medium Message: Pixel-path
performance warning: Pixel transfer is synchronized with 3D rendering.
I can't see any wrong usage of PBOs in my case or any errors.So the questions is, if these warnings are safe to discard, or I am actually doing smth wrong.
My code for that part:
//start copuying pixels into PBO from RAM:
mPBOs[mCurrentPBO].Bind(GL_PIXEL_UNPACK_BUFFER);
const uint32_t buffSize = pipe->GetBufferSize();
GLubyte* ptr = (GLubyte*)mPBOs[mCurrentPBO].MapRange(0, buffSize, GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT);
if (ptr)
{
memcpy(ptr, pipe->GetBuffer(), buffSize);
mPBOs[mCurrentPBO].Unmap();
}
//copy pixels from another already full PBO(except of first frame into texture //
mPBOs[1 - mCurrentPBO].Bind(GL_PIXEL_UNPACK_BUFFER);
//mCopyTex is bound to mCopyFBO as attachment
glTextureSubImage2D(mCopyTex->GetHandle(), 0, 0, 0, mClientSize.x, mClientSize.y,
GL_RGBA, GL_UNSIGNED_BYTE, 0);
mCurrentPBO = 1 - mCurrentPBO;
Then I just blit the result to default frame buffer. No rendering of geometry or anything like that.
glBlitNamedFramebuffer(
mCopyFBO,
0,//default FBO id
0,
0,
mViewportSize.x,
mViewportSize.y,
0,
0,
mViewportSize.x,
mViewportSize.y,
GL_COLOR_BUFFER_BIT,
GL_LINEAR);
Running on NVIDIA GTX 960 card.
This performance warning is nividia-specific and it is intended as a hint to tell you that you're not going to use a separate hw transfer queue, which is no wonder since you use a single thread, single GL context model, where both rendering (at least your your blit) and transfer are carried out.
See this nvidia presentation for some details about how nvidia handles this. Page 22 also explains this specific warning. Note that this warnign does not mean that your transfer is not asynchronous. It is still fully asynchronous to the CPU thread. It will just be synchronously processed on the GPU, with respect to the render commands which are in the same command queue, and you're not using the asynchronous copy engine which could do these copies independent from the rendering commands in a separate command queue.
I can't see any wrong usage of PBOs in my case or any errors.So the questions is, if these warnings are safe to discard, or I am actually doing smth wrong.
There is nothing wrong with your PBO usage.
It is not clear if your specific application could even benefit from using a more elaborate separate transfer queue scheme.

Write to DX11 backbuffer from C++AMP

According to this ms blog post
http://blogs.msdn.com/b/nativeconcurrency/archive/2012/07/02/interop-with-direct3d-textures-in-c-amp.aspx
You can write directly to the backbuffer from C++AMP.
Using Interop, you can get the texture object of the back buffer associated with the window using the IDXGISwapChain and update it directly in the C++ AMP kernel.
I created an amp device descriptor from the dx device and I got a pointer to the backbuffer and then tried to make an amp texture from it, but I found that the texture descriptor bindFlags, of the backbuffer, were only D3D11_BIND_RENDER_TARGET and I needed at least D3D11_BIND_UNORDERED_ACCESS or D3D11_BIND_SHADER_RESOURCE in order for Concurrency::graphics::direct3d::make_texture to function.
I can easily enough make any other d3d texture and connect that to amp, if I set the bindflags, but for the flags set on the backbuffer, I cannot connect them.
Then I find this post
http://social.msdn.microsoft.com/Forums/vstudio/en-US/15aa1186-210b-4ba7-89b0-b74f742d6830/c-amp-and-direct2d
which has the following marked an an answer by a Microsoft community contributor
I was trying to write to back buffer of the swap chain directly. As far as I understood, this can't be done, because usage flags that can be used when creating a back buffer texture are incompatible with ones that are needed by C++ AMP to manipulate the texture.
So, on one hand, it (writing to backbuffer from c++AMP) is used an an example of interop and on the other hand it is explained to not be possible...?
My current requirement is just to generate a raytraced image in C++AMP and show that on a d3d display without copying data back from the graphics card every frame. I realize that I could just generate my own texture and then render a quad with that, but it would be simpler writing directly to the backbuffer, and if it can be done, that is what I would like to do.
Perhaps someone here can explain if it can be done and what steps are required to accomplish this, or alternatively explain that no, this truly cannot be done.
Thanks in advance for any help on this topic.
[EDIT]
I now found this info
https://software.intel.com/en-us/articles/microsoft-directcompute-on-intel-ivy-bridge-processor-graphics
// this qualifies the back buffer for being the target of compute shader writes
sd.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT | DXGI_USAGE_UNORDERED_ACCESS | DXGI_USAGE_SHADER_INPUT;
I actually did try that previously, but the call to CreateSwapChainForCoreWindow fails with
First-chance exception at 0x75251D4D in TestDxAmp.exe: Microsoft C++ exception: Platform::InvalidArgumentException ^ at memory location 0x0328E484. HRESULT:0x80070057 The parameter is incorrect.
Which is not being very informative.
I think the original forum post is maybe misleading. For both texture and buffer interop the unordered access binding is required for AMP interop. AMP is built on top of DX/DirectCompute so this applies in both cases as noted in the Intel link.
your program can create an arrayassociated with an existing Direct3D
buffer using the make_array()function.
template<typename T, int N>
array<T,N> make_array(const extent& ext, IUnknown* buffer);
The
Direct3D buffer must implement the ID3D11Bufferinterface. It must
support raw views (D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS) and
allow SHADER_RESOURCE and UNORDERED_ACCESS binding. The buffer itself
must be of the correct size, the size of the extent multiplied by the
size of the buffer type. The following code uses make_arrayto create
an array using the accelerator_view, dxView, which was created in the
previous section: HRESULT hr = S_OK;
-- C++ AMP Book
I'm not a DX expert but from the following post it looks like you can configure the swap chain to support UAVs.
Sobel Filter Compute Shader

GL/CL interoperability: Shared Texture [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 11 years ago.
I intend to make Graphics calculation with OpenCL such as ray casting, ray marching and others. And I want to use OpenGL to display result of this calculations (pixel images). I use texture buffer attached to frame buffer. OpenCL writes the result into the texture and then I use glBlitFrameBuffer function to copy texture data to application window framebuffer.
I met a CL/GL inter problem during the implementation of it. I wrote a simple example to show it. This example shows framebuffer object and texture object initialization, their conjunction, OpenCL buffer creation from GL texture buffer. At the end the main render loop is shown. It consists of texture writing with new data in each frame, framebuffer attachment and copying of this framebuffer.
Texture Initialization:
for (int i = 0; i < data.Length; i +=4) {
data [i] = 255;
}
GL.BindTexture (TextureTarget.Texture2D, tboID [0]);
GL.TexImage2D<byte>(TextureTarget.Texture2D, 0, PixelInternalFormat.Rgba8, w, h, 0,
PixelFormat.Rgba, PixelType.UnsignedByte, data);
GL.BindTexture (TextureTarget.Texture2D, 0)
TBO+FBO Initialization:
GL.BindFramebuffer (FramebufferTarget.FramebufferExt, fboID [0]);
GL.FramebufferTexture2D (FramebufferTarget.FramebufferExt, FramebufferAttachment.ColorAttachment0,
TextureTarget.Texture2D, tboID [0], 0);
GL.BindFramebuffer (FramebufferTarget.FramebufferExt, 0);
CL/GL Initialization:
bufferID = CL.CreateFromGLTexture2D (context, memInfo, textureTarget, mipLevel, glBufferObject, out errorCode);
Render Loop:
for (int i = 0; i < data.Length; i += 4) {
data [i] = tt;
}
tt++;
GL.BindTexture (TextureTarget.Texture2D, tboID [0]);
GL.TexImage2D<byte> (TextureTarget.Texture2D, 0, PixelInternalFormat.Rgba8, w, h, 0,
PixelFormat.Rgba, PixelType.UnsignedByte, data);
GL.BindTexture (TextureTarget.Texture2D, 0);
GL.BindFramebuffer (FramebufferTarget.FramebufferExt, fboID [0]);
GL.FramebufferTexture2D (FramebufferTarget.FramebufferExt, FramebufferAttachment.ColorAttachment0,
TextureTarget.Texture2D, tboID [0], 0);
GL.BindFramebuffer (FramebufferTarget.FramebufferExt, 0);GL.BindFramebuffer (FramebufferTarget.ReadFramebuffer, fboID [0]);
GL.ReadBuffer (ReadBufferMode.ColorAttachment0);
GL.DrawBuffer (DrawBufferMode.Back);
GL.BlitFramebuffer (0, 0, w, h, 0, 0, w, h, ClearBufferMask.ColorBufferBit, BlitFramebufferFilter.Nearest);
GL.BindFramebuffer (FramebufferTarget.ReadFramebuffer, 0);
At the first glance this code looks weird, but it completely shows my problem. CL does not work at all here. In this application OpenCL context is created and OpenCL buffer initialization is occured.
The work of this should be simple. The color of screen is being changed from black to red. And It does not work in this way. The color does not change from the initial red (texture initialization).
But it works normal when I comment the CL/GL Initialization (creation of CL buffer from GL texture).
Why is it so? Why the behavior of the GL buffer is changed depending on CL attachments? How to fix it and make it works?
EDIT 2:
Then you need to check why you are getting an InvalidImageFormatDescriptor. Check if the parameter order is okay and whether your image descriptor in the tbo is misleading (internal image structure - see OpenCL specification). From the spec:
CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if the OpenGL texture internal format does not map to a supported OpenCL image format.
EDIT:
So I understand OpenCL functionality in OpenTK is provided by a separate project named Cloo. For ComputeImage2D their documentation states:
CreateFromGLTexture2D (ComputeContext context, ComputeMemoryFlags flags, int textureTarget, int mipLevel, int textureId)
Compared to yours:
CreateFromGLTexture2D (context, MemFlags.MemReadWrite, TextureTarget.Texture2D, ((uint[])tboID.Clone()) [0], 0);
Looking at that you have the mip level and the tbo in the wrong order. A false initialization might lead to some unknown behaviour.
Hard to tell what might be the issue from the code you're providing. Am looking at the Interop as well right now, just getting into it. First thing I would try is put a try/catch block around it and try to get a clue about any possible error code.
Have you verified the obvious: Is the cl_khr_gl_sharing extension available on your device?
Another guess, since you only provided the Texture/Image Initialisation of the actual OpenCl/OpenGL Interop within your sample code: Did you acquire the memory object?
cl_int clEnqueueAcquireGLObjects (cl_command_queue command_queue,
cl_uint num_objects.
const cl_mem *mem_objects,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list,
cl_event *event)
The OpenCL 1.1 specifiction states:
The function cl_int clEnqueueAcquireGLObjects is used to acquire OpenCL memory objects that have been created from OpenGL objects. These
objects need to be acquired before they can be used by any OpenCL commands queued to a
command-queue. The OpenGL objects are acquired by the OpenCL context associated with
command_queue and can therefore be used by all command-queues associated with the OpenCL
context.
So the issue might be that the memory object hasn't been bound to a specific command queue.
Also, one is responsible for issuing a glFinish() to ensure all the initialization on the OpenGL side is done before the memory object is acquired.
Finally we were able to run Itun' code on my system (Windows 7 / AMD Radeon HD 5870). Recall that Itun' code gradually changes color in texture from black to red by means of GL after activating GL/CL Interop on this texture.
The results are at least strange. On my system it works as intended. However, on Itun' system (Windows 7 / NVidia GeForce) the same code does not work at all and does not provide any exceptions or error codes. In addition, I would like to mention that CL works with this texture properly on both systems. Therefore, something is wrong with GL in this case.
We have no idea on what's going on - it could be either Itun' outdated GPU hardware or buggy NVidia drivers.

Error when calling glGetTexImage (atioglxx.dll)

I'm experiencing a difficult problem on certain ATI cards (Radeon X1650, X1550 + and others).
The message is: "Access violation at address 6959DD46 in module 'atioglxx.dll'. Read of address 00000000"
It happens on this line:
glGetTexImage(GL_TEXTURE_2D,0,GL_RGBA,GL_FLOAT,P);
Note:
Latest graphics drivers are installed.
It works perfectly on other cards.
Here is what I've tried so far (with assertions in the code):
That the pointer P is valid and allocated enough memory to hold the image
Texturing is enabled: glIsEnabled(GL_TEXTURE_2D)
Test that the currently bound texture is the one I expect: glGetIntegerv(GL_TEXTURE_2D_BINDING)
Test that the currently bound texture has the dimensions I expect: glGetTexLevelParameteriv( GL_TEXTURE_WIDTH / HEIGHT )
Test that no errors have been reported: glGetError
It passes all those test and then still fails with the message.
I feel I've tried everything and have no more ideas. I really hope some GL-guru here can help!
EDIT:
After concluded it is probably a driver bug I posted about it here too: http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=295137#Post295137
I also tried GL_PACK_ALIGNMENT and it didn't help.
By some more investigation I found that it only happened on textures that I have previously filled with pixels using a call to glCopyTexSubImage2D. So I could produce a workaround by replacing the glCopyTexSubImage2d call with calls to glReadPixels and then glTexImage2D instead.
Here is my updated code:
{
glCopyTexSubImage2D cannot be used here because the combination of calling
glCopyTexSubImage2D and then later glGetTexImage on the same texture causes
a crash in atioglxx.dll on ATI Radeon X1650 and X1550.
Instead we copy to the main memory first and then update.
}
// glCopyTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 0, 0, PixelWidth, PixelHeight); //**
GetMem(P, PixelWidth * PixelHeight * 4);
glReadPixels(0, 0, PixelWidth, PixelHeight, GL_RGBA, GL_UNSIGNED_BYTE, P);
SetMemory(P,GL_RGBA,GL_UNSIGNED_BYTE);
You might take care of the GL_PACK_ALIGNEMENT. This parameter told you the closest byte count to pack the texture. Ie, if you have a image of 645 pixels:
With GL_PACK_ALIGNEMENT to 4 (default value), you'll have 648 pixels.
With GL_PACK_ALIGNEMENT to 1, you'll have 645 pixels.
So ensure that the pack value is ok by doing:
glPixelStorei(GL_PACK_ALIGNMENT, 1)
Before your glGetTexImage(), or align your memory texture on the GL_PACK_ALIGNEMENT.
This is most likely a driver bug. Having written 3D apis myself it is even easy to see how. You are doing something that is really weird and rare to be covered by test: Convert float data to 8 bit during upload. Nobody is going to optimize that path. You should reconsider what you are doing in the first place. The generic conversion cpu conversion function probably kicks in there and somebody messed up a table that drives allocation of temp buffers for that. You should really reconsider using an external float format with an internal 8 bit format. Conversions like that in the GL api usually point to programming errors. If you data is float and you want to keep it as such you should use a float texture and not RGBA. If you want 8 bit why is your input float?