Bind CUDA output array/surface to GL texture in ManagedCUDA - opengl

I'm currently attempting to connect some form of output from a CUDA program to a GL_TEXTURE_2D for use in rendering. I'm not that worried about the output type from CUDA (whether it'd be an array or surface, I can adapt the program to that).
So the question is, how would I do that? (my current code copies the output array to system memory, and uploads it to the GPU again with GL.TexImage2D, which is obviously highly inefficient - when I disable those two pieces of code, it goes from approximately 300 kernel executions per second to a whopping 400)
I already have a little bit of test code, to at least bind a GL texture to CUDA, but I'm not even able to get the device pointer from it...
ctx = CudaContext.CreateOpenGLContext(CudaContext.GetMaxGflopsDeviceId(), CUCtxFlags.SchedAuto);
uint textureID = (uint)GL.GenTexture(); //create a texture in GL
GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
GL.TexImage2D(TextureTarget.Texture2D, 0, PixelInternalFormat.Rgba, width, height, 0, OpenTK.Graphics.OpenGL.PixelFormat.Rgba, PixelType.UnsignedByte, null); //allocate memory for the texture in GL
CudaOpenGLImageInteropResource resultImage = new CudaOpenGLImageInteropResource(textureID, CUGraphicsRegisterFlags.WriteDiscard, CudaOpenGLImageInteropResource.OpenGLImageTarget.GL_TEXTURE_2D, CUGraphicsMapResourceFlags.WriteDiscard); //using writediscard because the CUDA kernel will only write to this texture
//then, as far as I understood the ManagedCuda example, I have to do the following when I call my kernel
//(done without a CudaGraphicsInteropResourceCollection because I only have one item)
resultImage.Map();
var ptr = resultImage.GetMappedPointer(); //this crashes
kernelSample.Run(ptr); //pass the pointer to the kernel so it knows where to write
resultImage.UnMap();
The following exception is thrown when attempting to get the pointer:
ErrorNotMappedAsPointer: This indicates that a mapped resource is not available for access as a pointer.
What do I need to do to fix this?
And even if this exception can be resolved, how would I solve the other part of my question; that is, how do I work with the acquired pointer in my kernel? Can I use a surface for that? Access it as an arbitrary array (pointer arithmetic)?
Edit:
Looking at this example, apparently I don't even need to map the resource every time I call the kernel, and call the render function. But how would this translate to ManangedCUDA?

Thanks to the example I found, I was able to translate that to ManagedCUDA (after browsing the source code and fiddling around), and I'm happy to announce that this does really improve my samples per second from about 300 to 400 :)
Apparently it is needed to use a 3D array (I haven't seen any overloads in ManagedCUDA using 2D arrays) but that doesn't really matter - I just use a 3D array/texture which is exactly 1 deep.
id = GL.GenTexture();
GL.BindTexture(TextureTarget.Texture3D, id);
GL.TexParameter(TextureTarget.Texture3D, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
GL.TexParameter(TextureTarget.Texture3D, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
GL.TexImage3D(TextureTarget.Texture3D, 0, PixelInternalFormat.Rgba, width, height, 1, 0, OpenTK.Graphics.OpenGL.PixelFormat.Bgra, PixelType.UnsignedByte, IntPtr.Zero); //allocate memory for the texture but do not upload anything
CudaOpenGLImageInteropResource resultImage = new CudaOpenGLImageInteropResource((uint)id, CUGraphicsRegisterFlags.SurfaceLDST, CudaOpenGLImageInteropResource.OpenGLImageTarget.GL_TEXTURE_3D, CUGraphicsMapResourceFlags.WriteDiscard);
resultImage.Map();
CudaArray3D mappedArray = resultImage.GetMappedArray3D(0, 0);
resultImage.UnMap();
CudaSurface surfaceResult = new CudaSurface(kernelSample, "outputSurface", CUSurfRefSetFlags.None, mappedArray); //nothing needs to be done anymore - this call connects the 3D array from the GL texture to a surface reference in the kernel
Kernel code:
surface outputSurface;
__global__ void Sample() {
...
surf3Dwrite(output, outputSurface, pixelX, pixelY, 0);
}

Related

Having a dificult time with Directx11 dynamic texture Map/Unmap

I have been trying to upload a dynamic texture with Map/Unmap but no luck so far.
Here's the code im working with
D3D11_MAPPED_SUBRESOURCE subResource = {};
ImmediateContext->Map(dx11Texture, 0, D3D11_MAP_WRITE_DISCARD, 0, &subResource);
Memory::copy(subResource.pData, (const void*)desc.DataSet[0], texture->get_width() * texture->get_height() * GraphicsFormatUtils::get_format_size(texture->get_format()));
subResource.RowPitch = texture->get_width() * GraphicsFormatUtils::get_format_size(texture->get_format());
subResource.DepthPitch = 0;
ImmediateContext->Unmap(dx11Texture, 0);
I have created the texture with immutable state and supplying the data upfront, that worked out well but when i try to create it with a dynamic flag and upload the same data my texture shows a noisy visual.
This is the texture with immutable creation flags and updating the data upfront on the texture creation phase.
Immutable texture
This is the texture with dynamic creation flags and updating the data after the texture creation phase with Map/Unmap mehtods.
Dynamic texture
Any input would be appreciated.
When using map, the subResource rowPitch that is returned by the map function is the one that is expected for you to perform the copy (you can notice that you never send it back to the deviceContext, so it's read only).
It is generally a power of 2, for memory alignment purposes.
When you provide initial data in an (immutable or other) texture, this copy operation is hidden from you, but still happens behind the scene, so in that case, you need to perform the pitch test yourself.
The process of copying a dynamic texture is as follow :
int myDataRowPitch =; //width * format size (if you don't pad)
D3D11_MAPPED_SUBRESOURCE subResource = {};
ImmediateContext->Map(dx11Texture, 0, D3D11_MAP_WRITE_DISCARD, 0, &subResource);
if (myDataRowPitch == subResource.RowPitch)
{
//you can do a standard mem copy here
}
else
{
// here you need to copy line per line
}
ImmediateContext->Unmap(dx11Texture, 0);

Exception "Texture cannot be null" Direct X

I am coding a 2D Game using DirectX11 and DirectXTK.
I did a class Framework that initializes both the window displayed for the game and initializes DirectX. These initializations work correctly. Then, I decided to draw some backgrounds, etc in the window, but after a while it exits on an exception. I did a try{ ... } catch(){ } block, which tells me that "Texture cannot be null". However, i could not find which texture it is talking about, even by debbugging and checking all the values.
I decided to separate the different elements i was drawing in the window, to see where the problem might come from... So now i have 3 draw methods :
Draw(DWORD &elapsedTime);
DrawBackground(DWORD &elapsedTime);
DrawCharacter(DWORD &elapsedTime);
The Draw(DWORD &elapsedTime) method calls both DrawBackground() and DrawCharacter() methods.
Here is my Draw Method :
void Framework::Draw(DWORD * elapsedTime)
{
// Clearing the Back Buffer
immediateContext->ClearRenderTargetView(renderTargetView, Colors::Aquamarine);
//Clearing the depth buffer to max depth (1.0)
immediateContext->ClearDepthStencilView(depthStencilView, D3D11_CLEAR_DEPTH, 1.0f, 0); //immediateContext is a ID3D11DeviceContext*
CommonStates states(d3dDevice); //d3dDevice is a ID3D11Device*
sprites.reset(new SpriteBatch(immediateContext));
sprites->Begin(SpriteSortMode_Deferred, states.NonPremultiplied());
DrawBackground1(elapsedTime);
DrawCharacter(elapsedTime);
sprites->End();
//Presenting the back buffer to the front buffer
swapChain->Present(0, 0);
}
By debugging i am almost sure that the exception comes from both DrawBackground() and DrawCharacter(). Indeed, when I comment those in the Draw method, i have no error, but as soon as i put one it sets the exception after displaying what i want during a few seconds.
Here is the method DrawBackground() for example :
void Framework::DrawBackground1(DWORD * elpasedTime)
{
RECT *try1 = new RECT();
try1->bottom = 0; try1->left = 0; try1->right = (int)WIDTH; try1->bottom = (int)HEIGHT;
ID3D11ShaderResourceView * texture2 = nullptr;
ID3D11ShaderResourceView * textureRV = nullptr;
CreateDDSTextureFromFile(d3dDevice, L"../Images/backgrounds/set2_background.dds", nullptr, &textureRV);
CreateDDSTextureFromFile(d3dDevice, L"../Images/backgrounds/set3_tiles.dds", nullptr, &texture2);
sprites->Draw(textureRV, XMFLOAT2(0, 0), try1, Colors::White);
sprites->Draw(texture2, XMFLOAT2(0, 0), try1, Colors::CornflowerBlue);
}
So as soon as i uncomment this method (or any DrawCharacter(), which follows the same steps), the window displays what i expect it to for a few seconds, but then i get the exception "Texture cannot be null". I also noticed that the method DrawCharacter() lets the window displaying what i want longer than the method DrawBackground(), whose texture is way bigger than the character's one.
I'm not sure if this information is useful but i think that maybe this might be linked to the size of the texture ?
Would you notice anything that i did wrong in this code ? Why would a texture be considered null while it does display it for a while ? I've been looking for answers for a few hours now, some help would be amazing please !
Thank you
I noticed that you create two new ID3D11ShaderResourceView every iteration without Release-ing the old ones. You could try by creating the ShaderResourceViews only once and storing them as global variables, or you could try by ->Release() them after the sprites->Draw(...) calls.

SDL2 - Why does SDL_CreateTextureFromSurface() need a renderer*?

This is the syntax of the SDL_CreateTextureFromSurface function:
SDL_Texture* SDL_CreateTextureFromSurface(SDL_Renderer* renderer, SDL_Surface* surface)
However, I'm confused why we need to pass a renderer*? I thought we need a renderer* only when drawing the texture?
You need SDL_Renderer to get information about the applicable constraints:
maximum supported size
pixel format
And probably something more..
In addition to the answer by plaes..
Under the hood, SDL_CreateTextureFromSurface calls SDL_CreateTexture, which itself also needs a Renderer, to create a new texture with the same size as the passed in surface.
Then the the SDL_UpdateTexture function is called on the new created texture to load(copy) the pixel data from the surface you passed in to SDL_CreateTextureFromSurface. If the formats between the passed-in surface differ from what the renderer supports, more logic happens to ensure correct behavior.
The Renderer itself is needed for SDL_CreateTexture because its the GPU that handles and stores textures (most of the time) and the Renderer is supposed to be an abstraction over the GPU.
A surface never needs a Renderer since its loaded in RAM and handled by the CPU.
You can find out more about how these calls work if you look at SDL_render.c from the SDL2 source code.
Here is some code inside SDL_CreateTextureFromSurface:
texture = SDL_CreateTexture(renderer, format, SDL_TEXTUREACCESS_STATIC,
surface->w, surface->h);
if (!texture) {
return NULL;
}
if (format == surface->format->format) {
if (SDL_MUSTLOCK(surface)) {
SDL_LockSurface(surface);
SDL_UpdateTexture(texture, NULL, surface->pixels, surface->pitch);
SDL_UnlockSurface(surface);
} else {
SDL_UpdateTexture(texture, NULL, surface->pixels, surface->pitch);
}
}

Strange Direct3D memory behavior in Present()

So I am working on my own rendering engine and encountered some strange memory behavior.
When ever I call
IDXGISwapChain::Present(0, 0)
it increases my programs memory usage by the size of the vertices I rendered that frame.
I create a vertex buffer using this code:
ID3D10Buffer *pVertexBuffer;
D3D10_BUFFER_DESC desc;
desc.Usage = D3D10_USAGE_DYNAMIC;
desc.ByteWidth = stride * nNumVertices;
desc.BindFlags = D3D10_BIND_VERTEX_BUFFER;
desc.CPUAccessFlags = D3D10_CPU_ACCESS_WRITE;
desc.MiscFlags = 0;
D3D10_SUBRESOURCE_DATA vData;
vData.pSysMem = Vertices;
HRESULT hr = m_pDevice->CreateBuffer(&desc, &vData, &pVertexBuffer);
Draw it using
m_pDevice->IASetPrimitiveTopology(D3D10_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
m_pDevice->IASetVertexBuffers(0, 1, &pVertexBuffer, &stride, &offset);
m_pDevice->Draw(nNumVertices, 0);
And then Release it
pVertexBuffer->Release();
This throws the error
D3D10 INFO: ID3D10Device::IASetVertexBuffers: A currently bound VertexBuffer is being deleted; so naturally, will no longer be bound. [ STATE_SETTING INFO #31: IASETVERTEXBUFFERS_UNBINDDELETINGOBJECT]
But according to MSDN and other questions, this shouldn't be a problem.
Has anyone else experienced this before, or could give me a helping hand?
Edit1:
This happens a certain number of times, around 500, after this is stops using more memory.
If I then unload my mesh and reload it (so it has a different pointer), present() starts allocating more memory, the size of my mesh, for around 500 times. This goes on until my program runs out of memory.
One thing to add, the first time I call Present() it increases the used memory by the size of any texture I have bound, this includes the back buffer!
After a lot of experimenting I have found what is happening here.
Each time a ID3D10Buffer is created using CreateBuffer(), IDXGISwapChain::Present() will allocate 4 Kilobyte. For me, this happens 64 times per identical buffer and then it either stops, or releases the previously allocated memory.
I hope this might help someone else with this undocumented 'feature'

QGLBuffer::map returns NULL?

I'm trying to use QGLbuffer to display an image.
Sequence is something like:
initializeGL() {
glbuffer= QGLBuffer(QGLBuffer::PixelUnpackBuffer);
glbuffer.create();
glbuffer.bind();
glbuffer.allocate(image_width*image_height*4); // RGBA
glbuffer.release();
}
// Attempting to write an image directly the graphics memory.
// map() should map the texture into the address space and give me an address in the
// to write directly to but always returns NULL
unsigned char* dest = glbuffer.map(QGLBuffer::WriteOnly); FAILS
MyGetImageFunction( dest );
glbuffer.unmap();
paint() {
glbuffer.bind();
glBegin(GL_QUADS);
glTexCoord2i(0,0); glVertex2i(0,height());
glTexCoord2i(0,1); glVertex2i(0,0);
glTexCoord2i(1,1); glVertex2i(width(),0);
glTexCoord2i(1,0); glVertex2i(width(),height());
glEnd();
glbuffer.release();
}
There aren't any examples of using GLBuffer in this way, it's pretty new
Edit --- for search here is the working solution -------
// Where glbuffer is defined as
glbuffer= QGLBuffer(QGLBuffer::PixelUnpackBuffer);
// sequence to get a pointer into a PBO, write data to it and copy it to a texture
glbuffer.bind(); // bind before doing anything
unsigned char *dest = (unsigned char*)glbuffer.map(QGLBuffer::WriteOnly);
MyGetImageFunction(dest);
glbuffer.unmap(); // need to unbind before the rest of openGL can access the PBO
glBindTexture(GL_TEXTURE_2D,texture);
// Note 'NULL' because memory is now onboard the card
glTexSubImage2D(GL_TEXTURE_2D, 0, 0,0, image_width, image_height, glFormatExt, glType, NULL);
glbuffer.release(); // but don't release until finished the copy
// PaintGL function
glBindTexture(GL_TEXTURE_2D,textures);
glBegin(GL_QUADS);
glTexCoord2i(0,0); glVertex2i(0,height());
glTexCoord2i(0,1); glVertex2i(0,0);
glTexCoord2i(1,1); glVertex2i(width(),0);
glTexCoord2i(1,0); glVertex2i(width(),height());
glEnd();
You should bind the buffer before mapping it!
In the documentation for QGLBuffer::map:
It is assumed that create() has been called on this buffer and that it has been bound to the current context.
In addition to VJovic's comments, I think you are missing a few points about PBOs:
A pixel unpack buffer does not give you a pointer to the graphics texture. It is a separate piece of memory allocated on the graphics card to which you can write to directly from the CPU.
The buffer can be copied into a texture by a glTexSubImage2D(....., 0) call, with the texture being bound as well, which you do not do. (0 is the offset into the pixel buffer). The copy is needed partly because textures have a different layout than linear pixel buffers.
See this page for a good explanation of PBO usages (I used it a few weeks ago to do async texture upload).
create will return false if the GL implementation does not support buffers, or there is no current QGLContext.
bind returns false if binding was not possible, usually because type() is not supported on this GL implementation.
You are not checking if these two functions passed.
I got the same thing, map returns NULL. When I used the following order it is solved.
bool success = mPixelBuffer->create();
mPixelBuffer->setUsagePattern(QGLBuffer::DynamicDraw);
success = mPixelBuffer->bind();
mPixelBuffer->allocate(sizeof(imageData));
void* ptr =mPixelBuffer->map(QGLBuffer::ReadOnly);