SDL2 C++ Capturing Video of Renderer Animation/Sprite

SDL2 C++ Capturing Video of Renderer Animation/Sprite - c++

I have an animation/sprite created using SDL2. The animation works fine when it is being rendered to a screen. But now I also want it to be recorded into a video file (locally stored). For this, I am planning on using FFmpeg APIs, to which I'll be sending a raw RGB pixel data array.
My problem is with fetching the data from SDL2 APIs.
What I've tried is:
// From http://stackoverflow.com/questions/30157164/sdl-saving-window-as-bmp
SDL_Surface *sshot = SDL_CreateRGBSurface(0, 750, 750, 32, 0x00ff0000, 0x0000ff00, 0x000000ff, 0xff000000);
SDL_RenderReadPixels(gRenderer, NULL, SDL_PIXELFORMAT_ARGB8888, sshot->pixels, sshot->pitch);
// From https://wiki.libsdl.org/SDL_RWFromMem
char fName[50];
sprintf(fName, "/tmp/a/ss%03d.bmp", fileCnt);
char bitmap[310000];
SDL_RWops *rw = SDL_RWFromMem(bitmap, sizeof(bitmap));
SDL_SaveBMP_RW(sshot, rw, 1);
Above does not work. But dumping a single frame into a file with following code works:
SDL_SaveBMP(sshot, "/tmp/alok1/ss.bmp")
This obviously is not an acceptable solution - Writing to thousands of BMPs and then using FFmpeg from command-line to create a video.
What am I doing wrong? How do you extract data from SDL_RWops? Is the use of SDL_RWFromMem the right approach to my problem statement?

Your buffer is too small to fit specified image, hence it cannot be saved here. Increase buffer size to at least actual image size + BMP header (width*height*bpp + 54, but padding needs to be counted too (what SDL_Surface refers as pitch)).
Note that taking 3Mb from stack may get you dangerously close to overflow (but could still be fine, depends on what happened in functions prior to the one in question). Chain-calling several functions that takes big chunk of stack may very quickly deplete it. It is likely you don't really need any extra space or BMP conversion at all - like creating AVImage and copying pixels directly to it from SDL_Surface.
Also in terms of performance this kind of readback would not be great (but probably compression itself is much heavier anyway).

Related

Why is glReadPixels so slow and are there any alternative?

I need to take sceenshots at every frame and I need very high performance (I'm using freeGlut). What I figured out is that it can be done like this inside glutIdleFunc(thisCallbackFunction)
GLubyte *data = (GLubyte *)malloc(3 * m_screenWidth * m_screenHeight);
glReadPixels(0, 0, m_screenWidth, m_screenHeight, GL_RGB, GL_UNSIGNED_BYTE, data);
// and I can access pixel values like this: data[3*(x*512 + y) + color] or whatever
It does work indeed but I have a huge issue with it, it's really slow. When my window is 512x512 it runs no faster than 90 frames per second when only cube is being rendered, without these two lines it runs at 6500 FPS! If we compare it to irrlicht graphics engine, there I can do this
// irrlicht code
video::IImage *screenShot = driver->createScreenShot();
const uint8_t *data = (uint8_t*)screenShot->lock();
// I can access pixel values from data in a similar manner here
and 512x512 window runs at 400 FPS even with a huge mesh (Quake 3 Map) loaded! Take into account that I'm using openGL as driver inside irrlicht. To my inexperienced eye it seems like glReadPixels is copying every pixel data from one place to another while (uint8_t*)screenShot->lock() is just copying a pointer to already existent array. Can I do something similar to latter using freeGlut? I expect it to be faster than irrlicht.
Note that irrlicht uses openGL too (well it offers directX and other options as well but in the example I gave above I used openGL and by the way it was the fastest compared to other options)

OpenGL methods are used to manage the rendering pipeline. In its nature, while the graphics card is showing image to the viewer, computations of the next frame are being done. When you call glReadPixels; graphics card wait for the current frame to be done, reads the pixels and then starts computing the next frame. Therefore pipeline becomes stalled and becomes sequential.
If you can hold two buffers and tell to the graphics card to read data into these buffers interchanging each frame; then you can read-back from your buffer 1-frame late but without stalling the pipeline. This is called double buffering. You can also do triple buffering with 2 frame late read-back and so on.
There is a relatively old web page describing the phenomenon and implementation here: http://www.songho.ca/opengl/gl_pbo.html
Also there are a lot of tutorials about framebuffers and rendering into a texture on the web. One of them is here: http://www.opengl-tutorial.org/intermediate-tutorials/tutorial-14-render-to-texture/

Real time drawing in GDI

I'm currently writing a 3D renderer (for fun and research), so I need a way to draw my framebuffer to a window. Since I'm doing all of my calculations on CPU, the drawing needs to be as fast as possible.
One of my goals is to use no existing graphics library (OpenGL/DirectX) so the drawing to the screen is pure Win32. In my research I've found a couple of ways to create and draw bitmaps and now I'm looking for the best one.
My current implementation uses a bitmap created with CreateDIBSection(), which is drawn to my window DC using BitBlt().
CreateDIBSection() give me a pointer to my bitmap bytes so I can manipulate it without copying. Using this method I achieve an update rate of about 260 FPS (without any rendering done).
This seems a bit slow, so I'm looking for optimizations.
I've read something about that if you don't create a bitmap with the same palette as the system palette, some slow color conversions are done.
How can I make sure my DIB bitmap and window are compatible?
Are there methods of drawing an bitmap which are faster than my current implementation?
I've also read something about DrawDibDraw(), can anyone confirm that this is faster?

I've read something about that if you don't create a bitmap with the same palette as the system palette, some slow color conversions are done.
Very few systems run in a palette mode any more, so it seems unlikely this is an issue for you.
Aside from palettes, some GDI functions also cause a color matching conversion to be applied if the source bitmap and the destination have different gamuts. BitBlt, however, does not do this type of color matching, so you're not paying a price for that.
How can I make sure my DIB bitmap and window are compatible?
You don't. You can use DIBs (which are Device-Independent Bitmaps) or compatible (device-dependent) bitmaps. It's possible that your DIB bitmap matches the current mode of your device. For example, if you're using a 32 bpp DIB, and your display is in that same mode, then no conversion is necessary. If you want a bitmap that's guaranteed to be in the same mode as your device, then you can't use a DIB and all the nice properties it provides for predictable pixel layout and format.
Are there methods of drawing an bitmap which are faster than my current implementation?
The limitation is most likely in getting the data from system memory to graphics adapter memory. To get around that limitation, you need a faster graphics bus, or you need to render directly into graphic memory, which means you'd need to do your computation on the GPU rather than the CPU.
If you're rendering a 1920 x 1080 pixel image at 24 bits per pixel, that's close to 6 MB for your frame buffer. That's an awful lot of data. If you're doing that 260 times per second, that's actually pretty impressive.
I've also read something about DrawDibDraw(), can anyone confirm that this is faster?
It's conceivable, but the only way to know would be to measure it. And the results might vary from machine to machine because of differences in the graphics adapter (and which bus they use).

Whole screen capture and render in DirectX [PERFORMANCE]

I need some way to get screen data and pass them to DX9 surface/texture in my aplication and render it at at least 25fps at 1600*900 resolution, 30 would be better.
I tried BitBliting but even after that I am at 20fps and after loading data into texture and rendering it I am at 11fps which is far behind what I need.
GetFrontBufferData is out of question.
Here is something about using Windows Media API, but I am not familiar with it. Sample is saving data right into file, maybe it can be set up to give you individual frames, but I haven't found good enough documentation to try it on my own.
My code:
m_memDC.BitBlt(0, 0, m_Rect.Width(),m_Rect.Height(), //m_Rect is area to be captured
&m_dc, m_Rect.left, m_Rect.top, SRCCOPY);
//at 20-25fps after this if I comment out the rest
//DC,HBITMAP setup and memory alloc is done once at the begining
GetDIBits( m_hDc, (HBITMAP)m_hBmp.GetSafeHandle(),
0L, // Start scan line
(DWORD)m_Rect.Height(), // # of scan lines
m_lpData, // LPBYTE
(LPBITMAPINFO)m_bi, // address of bitmapinfo
(DWORD)DIB_RGB_COLORS); // Use RGB for color table
//at 17-20fps
IDirect3DSurface9 *tmp;
m_pImageBuffer[0]->GetSurfaceLevel(0,&tmp); //m_pImageBuffer is Texture of same
//size as bitmap to prevent stretching
hr= D3DXLoadSurfaceFromMemory(tmp,NULL,NULL,
(LPVOID)m_lpData,
D3DFMT_X8R8G8B8,
m_Rect.Width()*4,
NULL,
&r, //SetRect(&r,0,0,m_Rect.Width(),m_Rect.Height();
D3DX_DEFAULT,0);
//12-14fps
IDirect3DSurface9 *frameS;
hr=m_pFrameTexture->GetSurfaceLevel(0,&frameS); // Texture of that is rendered
pd3dDevice->StretchRect(tmp,NULL,frameS,NULL,D3DTEXF_NONE);
//11fps
I found out that for 512*512 square its running on 30fps (for i.e. 490*450 at 20-25) so I tried dividing screen, but it didn't seem to work well.
If there is something missing in code please write, don't vote down. Thanks

Starting with Windows 8, there is a new desktop duplication API that can be used to capture the screen in video memory, including mouse cursor changes and which parts of the screen actually changed or moved. This is far more performant than any of the GDI or D3D9 approaches out there and is really well-suited to doing things like encoding the desktop to a video stream, since you never have to pull the texture out of GPU memory. The new API is available by enumerating DXGI outputs and calling DuplicateOutput on the screen you want to capture. Then you can enter a loop that waits for the screen to update and acquires each frame in turn.
To encode the frames to a video, I'd recommend taking a look at Media Foundation. Take a look specifically at the Sink Writer for the simplest method of encoding the video frames. Basically, you just have to wrap the D3D textures you get for each video frame into IMFSample objects. These can be passed directly into the sink writer. See the MFCreateDXGISurfaceBuffer and MFCreateVideoSampleFromSurface functions for more information. For the best performance, typically you'll want to use a codec like H.264 that has good hardware encoding support (on most machines).
For full disclosure, I work on the team that owns the desktop duplication API at Microsoft, and I've personally written apps that capture the desktop (and video, games, etc.) to a video file at 60fps using this technique, as well as a lot of other scenarios. This is also used to do screen streaming, remote assistance, and lots more within Microsoft.

If you don't like the FrontBuffer, try the BackBuffer:
LPDIRECT3DSURFACE9 surface;
surface = GetBackBufferImageSurface(&fmt);
to save it to a file use
D3DXSaveSurfaceToFile(filename, D3DXIFF_JPG, surface, NULL, NULL);

Fast Updating of QPixmap from byte array

I'm working on a vision application and I need to have a "Live View" from the camera displayed on the screen using a QPixmap object. We will be updating the screen at 30frames/second on a continuous basis.
My problem is that this application has to run on some 3-5 year old computers that, by todays standards, are slow. So what I would like to do is to be able to directly write to the display byte array inside of QPixmap. After going through the program code, almost option for changing the contents of a Pixmap results in a new QPixmap being created. This is the overhead I'm trying to get ride of.
Additionally, I would like to prevent all the new/deletes from occurring just to keep memory fragmentation under control.
Any suggestions?

First of all, the most important piece of information regarding the "picture" classes in Qt:
QImage is designed and optimized for I/O, and for direct pixel access and manipulation, while QPixmap is designed and optimized for showing images on screen.
What this means is that QPixmap is a generic representation of your platform's native image format: Pixmap on Unix, HBITMAP on Windows, CGImageRef on the Mac. QImage is a "pixel array with operations" type of class.
I'm assuming the following:
You are reading raw camera frames in a specific pixel format
You really are having memory fragmentation issues (as opposed to emotionally having them)
My advice is to use QImage instead of QPixmap. Specifically, there is a constructor that accepts a raw byte array and uses it directly as the pixel buffer:
QImage::QImage(uchar *data, int width, int height, int bytesPerLine, Format format)
Having constructed a QImage, use a QPainter to draw it to a widget at the desired frequency. Be warned however that:
If you are reading raw camera frames, format conversion may still be necessary. Twice, in the worst case: Camera ➔ Qimage ➔ Platform Bitmap.
You cannot avoid memory allocation from the free store when using QPixmap and QImage: they are implicitly shared classes and necessarily allocate memory from the free store. (On the other hand, that means you should not new/delete them explicitly.)
Our team managed to display fullscreen compressed video smoothly on Atom-powered computers using only Qt (albeit at a lower framerate). If this does not solve your problem, however, I'd bypass Qt and use the native drawing API. If you absolutely need platform independence, then OpenGL or SDL may be good solutions.

I have found that QImages are faster for Direct I/O operations.
Could you provide more detail as to what you are getting and trying to do with the QPixmap?

Magick++ animation generation via SDL pixel data

I'm trying to generate ImageMagick images from SDL pixel data. Here's what the GIF looks like so far. (This GIF is slower than the one below on purpose.)
http://www.starlon.net/images/combo.gif
Here's what it's supposed to look like. Notice in the above image that the pixels seem to be overlayed on top of other pixels.
http://www.starlon.net/images/combo2.gif
Here's where the GIF is actually created.
void DrvSDL::WriteGif() {
std::list<Magick::Image> gif;
for(std::list<Magick::Blob>::iterator it = image_.begin(); it != image_.end(); it++) {
Magick::Geometry geo(cols_ * pixels.x, rows_ * pixels.y);
Magick::Image image(*it, geo, 32, "RGB");
gif.push_back(image);
LCDError("image");
}
for_each(gif.begin(), gif.end(), Magick::animationDelayImage(ani_speed_));
Magick::writeImages(gif.begin(), gif.end(), gif_file_);
}
And here's where the Blob is packed.
image_.push_back(Magick::Blob(surface_->pixels, rows_ * pixels.y * surface_->pitch));
And here's how I initialize the SDL surface.
surface_ = SDL_SetVideoMode(cols_ * pixels.x, rows_ * pixels.y, 32, SDL_SWSURFACE);

The top image is normally caused by a misaligned buffer. The SDL buffer is probably not DWORD aligned and the ImageMagick routines expect the buffer to be aligned on a DWORD. This is very common in bitmap processing. The popular image processing library - Leadtools, commonly, requires DWORD aligned data. This is mostly case with monochrome and 32 bit color but can be the case for any color depth.
What you need to do is write out a DWORD aligned bitmap from your SDL buffer or at least create a buffer that is DWORD aligned.
The ImageMagick API documentation may be able to help clarify this further.

Another thing you might want to try is to clear the buffers to make sure there isn't any data already there. I don't really know IM's API, but pixels overlayed on top of other pixels usually indicates a dirty buffer.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js