Direct write to D3D texture from kernel - c++

I am playing around with NVDEC H.264 decoder from NVIDIA CUDA samples, one thing I've found out is once frame is decoded, it's converted from NV12 to BGRA buffer which is allocated on CUDA's side, then this buffer is copied to D3D BGRA texture.
I find this not very efficient in terms of memory usage, and want to convert NV12 frame directly to D3D texture with this kernel:
void Nv12ToBgra32(uint8_t *dpNv12, int nNv12Pitch, uint8_t *dpBgra, int nBgraPitch, int nWidth, int nHeight, int iMatrix)
So, create D3D texture (BGRA, D3D11_USAGE_DEFAULT, D3D11_BIND_SHADER_RESOURCE | D3D11_BIND_UNORDERED_ACCESS, D3D11_CPU_ACCESS_WRITE, 1 mipmap),
then register and write it on CUDA side:
//Register
ck(cuGraphicsD3D11RegisterResource(&cuTexResource, textureResource, CU_GRAPHICS_REGISTER_FLAGS_NONE));
...
//Write output:
CUarray retArray;
ck(cuGraphicsMapResources(1, &cuTexResource, 0));
ck(cuGraphicsSubResourceGetMappedArray(&retArray, cuTexResource, 0, 0));
/*
yuvFramePtr (NV12) is uint8_t* from decoded frame,
it's stored within CUDA memory I believe
*/
Nv12ToBgra32(yuvFramePtr, w, (uint8_t*)retArray, 4 * w, w, h);
ck(cuGraphicsUnmapResources(1, &cuTexResource, 0));
Once kernel is called, I get crash. May be because of misusing CUarray, can anybody please clarify how to use output of cuGraphicsSubResourceGetMappedArray to write texture memory from CUDA kernel? (since writing raw memory is only needed, there is no need to handle correct clamp, filtering and value scaling)

Ok, for anyone who struggling on question "How to write D3D11 texture from CUDA kernel", here is how:
Create D3D texture with D3D11_BIND_UNORDERED_ACCESS.
Then, register resource:
//ID3D11Texture2D *textureResource from D3D texture
CUgraphicsResource cuTexResource;
ck(cuGraphicsD3D11RegisterResource(&cuTexResource, textureResource, CU_GRAPHICS_REGISTER_FLAGS_NONE));
//You can also add write-discard if texture will be fully written by kernel
ck(cuGraphicsResourceSetMapFlags(cuTexResource, CU_GRAPHICS_MAP_RESOURCE_FLAGS_WRITE_DISCARD));
Once texture is created and registered we can use it as write surface.
ck(cuGraphicsMapResources(1, &cuTexResource, 0));
//Get array for first mip-map
CUArray retArray;
ck(cuGraphicsSubResourceGetMappedArray(&retArray, cuTexResource, 0, 0));
//Create surface from texture
CUsurfObject surf;
CUDA_RESOURCE_DESC surfDesc{};
surfDesc.res.array.hArray = retArray;
surfDesc.resType = CU_RESOURCE_TYPE_ARRAY;
ck(cuSurfObjectCreate(&surf, &surfDesc));
/*
Kernel declaration is:
void Nv12ToBgra32Surf(uint8_t* dpNv12, int nNv12Pitch, cudaSurfaceObject_t surf, int nBgraPitch, int nWidth, int nHeight, int iMatrix)
Surface write:
surf2Dwrite<uint>(VALUE, surf, x * sizeof(uint), y);
For BGRA surface we are writing uint, X offset is in bytes,
so multiply it with byte-size of type.
Run kernel:
*/
Nv12ToBgra32Surf(yuvFramePtr, w, /*out*/surf, 4 * w, w, h);
ck(cuGraphicsUnmapResources(1, &cuTexResource, 0));
ck(cuSurfObjectDestroy(surf));

Related

How to go from raw Bitmap data to SDL Surface or Texture?

I'm using a library called Awesomium and it has the following function:
void Awesomium::BitmapSurface::CopyTo ( unsigned char * dest_buffer, // output
int dest_row_span, // input that I can select
int dest_depth, // input that I can select
bool convert_to_rgba, // input that I can select
bool flip_y // input that I can select
) const
Copy this bitmap to a certain destination. Will also set the dirty bit to False.
Parameters
dest_buffer A pointer to the destination pixel buffer.
dest_row_span The number of bytes per-row of the destination.
dest_depth The depth (number of bytes per pixel, is usually 4 for BGRA surfaces and 3 for BGR surfaces).
convert_to_rgba Whether or not we should convert BGRA to RGBA.
flip_y Whether or not we should invert the bitmap vertically.
This is great because it gives me an unsigned char * dest_buffer which contains raw bitmap data. I've been trying for several hours to convert this raw bitmap data into some sort of usable format that I can use in SDL but I'm having trouble. =[ Is there any way I can load it into a SDL texture or surface? It would be ideal to have examples for both but if I only get one example (either texture or surface), that is sufficient and I will be very grateful. :) I tried to use SDL_LoadBMP_RW but that crashed. I'm not even sure if I should be using that method.
SDL_LoadBMP_RW is for loading an image in the BMP file format. And it expects an SDL_RWops*, which is a file stream, not a pixel buffer. The function you want is SDL_CreateRGBSurfaceFrom. I believe this call should work for your purposes:
SDL_Surface* surface =
SDL_CreateRGBSurfaceFrom(
pixels, // dest_buffer from CopyTo
width, // in pixels
height, // in pixels
depth, // in bits, so should be dest_depth * 8
pitch, // dest_row_span from CopyTo
Rmask, // RGBA masks, see docs
Gmask,
Bmask,
Amask
);

How to create one bitmap from parts of many textures (C++, SDL 2)?

I have *.png files and I want to get different 8x8 px parts from textures and place them on bitmap (SDL_Surface, I guess, but maybe not), smth like this:
Now I'm rendering that without bitmap, i.e. I call each texture and draw part directly on screen each frame, and it's too slow. I guess I need to load each *.png to separate bitmap and use them passing video memory, then call just one big bitmap, but maybe I'm wrong. I need the fastest way of doing that, I need code of this (SDL 2, not SDL 1.3).
Also maybe I need to use clear OpenGL here?
Update:
Or maybe I need to load *.png's to int arrays somehow and use them just like usual numbers and place them to one big int array, and then convert it to SDL_Surface/SDL_Texture? It seems this is the best way, but how to write this?
Update 2:
Colors of pixels in each block are not the same as it presented at the picture and also can they be transparent. Picture is just an example.
Assumming you already have your bitmaps loaded up as SDL_Texture(s), composing them into a different texture is done via SDL_SetRenderTarget .
SDL_SetRenderTarget(renderer, target_texture);
SDL_RenderCopy(renderer, texture1, ...);
SDL_RenderCopy(renderer, texture2, ...);
...
SDL_SetRenderTarget(renderer, NULL);
Every render operation you perform between setting your Render Target and resetting it (by calling SDL_SetRenderTarget with a NULL texture parameter) will be renderer to the designated texture. You can then use this texture as you would use any other.
Ok so, when I asked about "solid colour", I meant - "in that 8x8 pixel area in the .png that you are copying from, do all 64 pixels have the same identical RGB value?" It looks that way in your diagram, so how about this:
How about creating an SDL_Surface, and directly painting 8x8 pixel areas of the memory pointed to by the pixels member of that SDL_Surface with the values read from the original .png.
And then when you're done, convert that surface to an SDL_Texture and render that?
You would avoid all the SDL_UpdateTexture() calls.
Anyway here is some example code. Let's say that you create a class called EightByEight.
class EightByEight
{
public:
EightByEight( SDL_Surface * pDest, Uint8 r, Uint8 g, Uint8 b):
m_pSurface(pDest),
m_red(r),
m_green(g),
m_blue(b){}
void BlitToSurface( int column, int row );
private:
SDL_Surface * m_pSurface;
Uint8 m_red;
Uint8 m_green;
Uint8 m_blue;
};
You construct an object of type EightByEight by passing it a pointer to an SDL_Surface and also some values for red, green and blue. This RGB corresponds to the RGB value taken from the particular 8x8 pixel area of the .png you are currently reading from. You will paint a particular 8x8 pixel area of the SDL_Surface pixels with this RGB value.
So now when you want to paint an area of the SDL_Surface, you use the function BlitToSurface() and pass in a column and row value. For example, if you divided the SDL_Surface into 8x8 pixel squares, BlitToSurface(3,5) means the paint the square at the 4th column, and 5th row with the RGB value that I set on construction.
The BlitToSurface() looks like this:
void EightByEight::BlitToSurface(int column, int row)
{
Uint32 * pixel = (Uint32*)m_pSurface->pixels+(row*(m_pSurface->pitch/4))+column;
// now pixel is pointing to the first pixel in the correct 8x8 pixel square
// of the Surface's pixel memory. Now you need to paint a 8 rows of 8 pixels,
// but be careful - you need to add m_pSurface->pitch - 8 each time
for(int y = 0; y < 8; y++)
{
// paint a row
for(int i = 0; i < 8; i++)
{
*pixel++ = SDL_MapRGB(m_pSurface->format, m_red, m_green, m_blue);
}
// advance pixel pointer by pitch-8, to get the next "row".
pixel += (m_pSurface->pitch - 8);
}
}
I'm sure you could probably speed things up further by pre-calculating an RGB value on construction. Or if you're reading a pixel from the texture, you could probably dispense with the SDL_MapRGB() (but it's just there in case the Surface has different pixel format to the .png).
memcpy is probably faster than 8 individual assignments to the RGB value - but I just want to demonstrate the technique. You could experiment.
So, all the EightByEight objects you create, all point to the same SDL_Surface.
And then, when you're done, you just convert that SDL_Surface to an SDL_Texture and blit that.
Thanks to everyone who took part, but we solved it with my friends. So here is an example (source code is too big and unnecessary here, I'll just describe the main idea):
int pitch, *pixels;
SDL_Texture *texture;
...
if (!SDL_LockTexture(texture, 0, (void **)&pixels, &pitch))
{
for (/*Conditions*/)
memcpy(/*Params*/);
SDL_UnlockTexture(texture);
}
SDL_RenderCopy(renderer, texture, 0, 0);

How do I write PNG files from an openGL screen?

So I have this script which reads the display data into a character array pixels:
typedef unsigned char uchar;
// we will store the image data here
uchar *pixels;
// the thingy we use to write files
FILE * shot;
// we get the width/height of the screen into this array
int screenStats[4];
// get the width/height of the window
glGetIntegerv(GL_VIEWPORT, screenStats);
// generate an array large enough to hold the pixel data
// (width*height*bytesPerPixel)
pixels = new unsigned char[screenStats[2]*screenStats[3]*3];
// read in the pixel data, TGA's pixels are BGR aligned
glReadPixels(0, 0, screenStats[2], screenStats[3], 0x80E0,
GL_UNSIGNED_BYTE, pixels);
Normally, I save this to a TGA file, but since these get monstrously large I was hoping to use PNG instead as I quickly run out of hard drive space doing it this way (my images are highly monotonous and easily compressible, so the potential gain is huge). So I'm looking at PNG writer but I'm open to other suggestions. The usage example they give at their website is this:
#include <pngwriter.h>
int main()
{
pngwriter image(200, 300, 1.0, "out.png");
image.plot(30, 40, 1.0, 0.0, 0.0); // print a red dot
image.close();
return 0;
}
As I'm somewhat new to image processing I'm a little confused about the form of my pixels array and how I would convert this to a form representable in the above format. As a reference, I've been using the following script to convert my files to TGA:
//////////////////////////////////////////////////
// Grab the OpenGL screen and save it as a .tga //
// Copyright (C) Marius Andra 2001 //
// http://cone3d.gz.ee EMAIL: cone3d#hot.ee //
//////////////////////////////////////////////////
// (modified by me a little)
int screenShot(int const num)
{
typedef unsigned char uchar;
// we will store the image data here
uchar *pixels;
// the thingy we use to write files
FILE * shot;
// we get the width/height of the screen into this array
int screenStats[4];
// get the width/height of the window
glGetIntegerv(GL_VIEWPORT, screenStats);
// generate an array large enough to hold the pixel data
// (width*height*bytesPerPixel)
pixels = new unsigned char[screenStats[2]*screenStats[3]*3];
// read in the pixel data, TGA's pixels are BGR aligned
glReadPixels(0, 0, screenStats[2], screenStats[3], 0x80E0,
GL_UNSIGNED_BYTE, pixels);
// open the file for writing. If unsucessful, return 1
std::string filename = kScreenShotFileNamePrefix + Function::Num2Str(num) + ".tga";
shot=fopen(filename.c_str(), "wb");
if (shot == NULL)
return 1;
// this is the tga header it must be in the beginning of
// every (uncompressed) .tga
uchar TGAheader[12]={0,0,2,0,0,0,0,0,0,0,0,0};
// the header that is used to get the dimensions of the .tga
// header[1]*256+header[0] - width
// header[3]*256+header[2] - height
// header[4] - bits per pixel
// header[5] - ?
uchar header[6]={((int)(screenStats[2]%256)),
((int)(screenStats[2]/256)),
((int)(screenStats[3]%256)),
((int)(screenStats[3]/256)),24,0};
// write out the TGA header
fwrite(TGAheader, sizeof(uchar), 12, shot);
// write out the header
fwrite(header, sizeof(uchar), 6, shot);
// write the pixels
fwrite(pixels, sizeof(uchar),
screenStats[2]*screenStats[3]*3, shot);
// close the file
fclose(shot);
// free the memory
delete [] pixels;
// return success
return 0;
}
I don't normally like to just dump and bail on these forums but in this instance I'm simply stuck. I'm sure the conversion is close to trivial I just don't understand enough about image processing to get it done. If someone could provide a simple example for how to convert the pixels array into image.plot() in the PNG writer library, or provide a way of achieving this using a different library that would be great! Thanks.
Your current implementation does almost all the work. All you have to do is to write into the PNG file the pixel colors returned by OpenGL. Since there is no method in PNG Writer to pass an array of colors, you will have to write the pixels one by one.
Your call to glReadPixels() hides the requested color format. You should use one of the predefined constants (see the format argument) instead of 0x80E0. According to how you build the pixel array, I guess you are requesting red/green/blue components.
Thus, your pixel-to-png code may look like this:
const std::size_t image_width( screenStats[2] );
const std::size_t image_height( screenStats[3] );
pngwriter image( image_width, image_height, /*…*/ );
for ( std::size_t y(0); y != image_height; ++y )
for ( std::size_t x(0); x != image_width; ++x )
{
unsigned char* rgb( pixels + 3 * (y * image_width + x) );
image.plot( x, y, rgb[0], rgb[1], rgb[2] );
}
image.close()
As an alternative to PNGwriter, you may have a look at libclaw or use libpng as is.

Faster encoding of realtime 3d graphics with opengl and x264

I am working on a system that sends a compressed video to a client from 3d graphics that are done in the server as soon as they are rendered.
I already have the code working, but I feel it could be much faster (and it is already a bottleneck in the system)
Here is what I am doing:
First I grab the framebuffer
glReadBuffer( GL_FRONT );
glReadPixels( 0, 0, width, height, GL_RGB, GL_UNSIGNED_BYTE, buffer );
Then I flip the framebuffer, because there is a weird bug with swsScale (which I am using for colorspace conversion) that flips the image vertically when I convert. I am flipping in advance, nothing fancy.
void VerticalFlip(int width, int height, byte* pixelData, int bitsPerPixel)
{
byte* temp = new byte[width*bitsPerPixel];
height--; //remember height array ends at height-1
for (int y = 0; y < (height+1)/2; y++)
{
memcpy(temp,&pixelData[y*width*bitsPerPixel],width*bitsPerPixel);
memcpy(&pixelData[y*width*bitsPerPixel],&pixelData[(height-y)*width*bitsPerPixel],width*bitsPerPixel);
memcpy(&pixelData[(height-y)*width*bitsPerPixel],temp,width*bitsPerPixel);
}
delete[] temp;
}
Then I convert it to YUV420p
convertCtx = sws_getContext(width, height, PIX_FMT_RGB24, width, height, PIX_FMT_YUV420P, SWS_FAST_BILINEAR, NULL, NULL, NULL);
uint8_t *src[3]= {buffer, NULL, NULL};
sws_scale(convertCtx, src, &srcstride, 0, height, pic_in.img.plane, pic_in.img.i_stride);
Then I pretty much just call the x264 encoder. I am already using the zerolatency preset.
int frame_size = x264_encoder_encode(_encoder, &nals, &i_nals, _inputPicture, &pic_out);
My guess is that there should be a faster way to do this. Capturing the frame and converting it to YUV420p. It would be nice to convert it to YUV420p in the GPU and only after that copying it to system memory, and hopefully there is a way to do color conversion without the need to flip.
If there is no better way, at least this question may help someone trying to do this, to do it the same way I did.
First , use async texture read using PBOs.Here is example It speeds ups the read by using 2 PBOs which work asynchronously without stalling the pipeline like readPixels does when used directly.In my app I got 80% performance boost when switched to PBOs.
Additionally , on some GPUs glGetTexImage() works faster than glReadPixels() so try it out.
But if you really want to take the video encoding to the next level you can do it via CUDA using Nvidia Codec Library.I recently asked the same question so this can be helpful.

opengl video freeze

I have an IDS ueye cam and proceed the capture via PBO to OpenGL (OpenTK). On my developer-pc it works great, but on slower machines the video freezes after some time.
Code for allocating memory via opengl and map to ueye, so camera saves processed images in here:
// Generate PBO and save id
GL.GenBuffers(1, out this.frameBuffer[i].BufferID);
// Define the type of the buffer.
GL.BindBuffer(BufferTarget.PixelUnpackBuffer, this.frameBuffer[i].BufferID);
// Define buffer size.
GL.BufferData(BufferTarget.PixelUnpackBuffer, new IntPtr(width * height * depth), IntPtr.Zero, BufferUsageHint.StreamDraw);
// Get pointer to by openGL allocated buffer and
// lock global with uEye.
this.frameBuffer[i].PointerToNormalMemory = GL.MapBuffer(BufferTarget.PixelUnpackBuffer, BufferAccess.WriteOnly);
this.frameBuffer[i].PointerToLockedMemory = uEye.GlobalLock(this.frameBuffer[i].PointerToNormalMemory);
// Unmap PBO after use.
GL.UnmapBuffer(BufferTarget.PixelUnpackBuffer);
// Set selected PBO to none.
GL.BindBuffer(BufferTarget.PixelUnpackBuffer, 0);
// Register buffer to uEye
this.Succeeded("SetAllocatedImageMem", this.cam.SetAllocatedImageMem(width, height, depth, this.frameBuffer[i].PointerToLockedMemory, ref this.frameBuffer[i].MemId));
// Add buffer to uEye-Ringbuffer
this.Succeeded("AddToSequence", this.cam.AddToSequence(this.frameBuffer[i].PointerToLockedMemory, this.frameBuffer[i].MemId));
To copy the image from pbo to an texture (Texture is created and ok):
// Select PBO with new video image
GL.BindBuffer(BufferTarget.PixelUnpackBuffer, nextBufferId);
// Select videotexture as current
GL.BindTexture(TextureTarget.Texture2D, this.videoTextureId);
// Copy PBO to texture
GL.TexSubImage2D(
TextureTarget.Texture2D,
0,
0,
0,
nextBufferSize.Width,
nextBufferSize.Height,
OpenTK.Graphics.OpenGL.PixelFormat.Bgr,
PixelType.UnsignedByte,
IntPtr.Zero);
// Release Texture
GL.BindTexture(TextureTarget.Texture2D, 0);
// Release PBO
GL.BindBuffer(BufferTarget.PixelUnpackBuffer, 0);
Maybe someone can see the mistake... After about 6 seconds the ueye events don't deliver any images any more. When I remove TexSubImage2D it works well, but of course no image appears.
Is there maybe a lock or something from opengl?
Thanks in advance - Thomas
it seems like a shared buffer problem. you may try to implement a simple queue mechanism to get rid of that problem.
sample code (not meant to be working):
queue< vector<BYTE> > frames;
...
frames.push(vector<BYTE>(frameBuffer, frameBuffer + frameSize));
...
// use frame here at GL.TexSubImage2D using frames.front()
frames.pop();
Found the failure by myself. Just replace in the code above StreamDraw with StreamRead.
GL.BufferData(BufferTarget.PixelUnpackBuffer, new IntPtr(width * height * depth), IntPtr.Zero, BufferUsageHint.StreamRead);