How to paint in Qt using hardware acceleration? - c++

I need to do some quite intensive drawing processes in my appliation - I'm recording a camera and a desktop. I'm using ffmpeg to capture and encode/decode my videos. Those processes are quite fast and they working in a separate thread, but when it comes to render - perfomance drops drastically. I'm using QLabel to display captured frames via setPixmap() method and I see that Qt not using any GPU power even if a matrix operation, such as scaling an image to fit the window is clearly implied. Is there any way to paint images in Qt using hardware acceleration? Or is there another ways to speed up painting process?

Software rendering is not the only possible performance killer.
I also used to use QWidget for similar kinds of jobs, and it seems to be OK.
I assume the result of FFmpeg processing is an uncompressed byte array with YUV or RGB color scheme.
You already have some memory allocated for your FFmpeg image frame.
I believe you're creating your pixmaps with QPixmap::fromImage(...), which implies copying.
But first, you need to construct QImage, which may imply copying.
So we have one or two full copies every frame.
Try one of the QImage constructors that uses an existing memory buffer (see Qt docs).
Or, ideally, you should have a once allocated QImage and use its memory buffer with FFmpeg (FFmpeg directly writes to QImage)
Sublass QWidget, reimplement paintEvent() and paint your QImage (not QPixmap) there.
Ideally, it should be the same QImage from the previous step.
ffmpeg
{
mutex.lock();
write(image);
mutex.unlock();
}
QWidet::paintEvent
{
mutex.lock();
paint(image);
mutex.unlock();
}
You definitely can use QOpenGLWidget for GPU drawing, which, in my opinion, wouldn't help you much.
QOpenGLWidget uses buffered rendering (i.e. copying), and it will take some cycles to upload your image from the CPU side to the GPU side.
Transform operations will become faster, though.

Related

QImage vs OpenGL Performance

I'm porting an old 4.8 application to 5.2.1 and back in that time, I used QImage to render some raw data on the screen, in a QLabel.
I am grabbing images from a camera, so i want to display those images in real-time. Until now, with QImage, i achieve over 20FPS (the camera is able to grab 30 FPS).
I'm wondering if rendering this data on OpenGL (maybe in a QML Quick / Qt Widgets new application) would be faster than the current developed method?
With next assumptions in mind :
your implementation in OpenGL is using HW acceleration
your implementation is using optimal texture parameters, to display the image (i.e. the driver is not doing some conversion)
you may achieve better performances using OpenGL. QImage still has to hold data at both the memory and GPU, meaning at least one additional copy is needed when updating QImage. With OpenGL, you can copy data directly to GPU memory and you do not need to store the data somewhere in memory.
However, what may be optimal on one GPU, doesn't have to be optimal on another. So, if you are implementing something that needs to run on various hardware, I would advise to go for QImage.
But as said, the only way is to implement and measure.

C++ GUI Development - Bitmap vs. Vector Graphics CPU Usage

I'm currently in the process of designing and developing GUI's for some audio applications made in C++ (using the Juce framework).
So far I've been playing with using bitmap graphics to create custom sliders and dials, by using 'film strip' style images to animate the components (meaning when the user interacts with a slider it triggers a method that changes the offset of a film-strip image to change the components appearance). Depending on the size of the original image and the number of 'frames', the CPU usage level changes quite dramatically.
Firstly, what would be the most efficient bitmap file format to use in terms of CPU consumption? At the moment I'm using PNG images.
Secondly, would it be more efficient to use vector graphics for these kind of graphical components? I understand the main differences between bitmap and vector graphics, but I haven't found any information regarding their CPU usage levels with regard to GUI interaction.
Or would CPU usage be down to the particular methods/functions/libraries/frameworks being used?
Thanks!
Or would CPU consumption be down to the particular methods/functions/libraries/frameworks being used?
Any of these things could influence it.
Pixel based images might take a while to read off of disk the bigger they are. Compressed types might take more time to uncompress. Vector might take more time to render when are loaded.
That being said, I would definitely not expect that your choice of image type to have any impact on its performance. Since you didn't provide a code example it is hard to speculate beyond that.
In general, you would expect that the run-time costs of the images to happen when they are loaded. So whenever you create an image object. If you create an images all over the place, then maybe its expensive. It is possible that your film strip is recreating the images instead of loading them once and caching them.
Before choosing bitmap vs. vector graphics, investigate if your graphics processor supports vector or bitmap graphics. Some things take a long time to draw as vectors.
Have you tried double-bufferring?
This is where you write to a buffer in memory while the display (graphics processor) is loading another.
Load your bitmaps from the resource once. Store them as memory snapshots to avoid the additional cost of translating them from a format.
Does your graphic processor support "blitting"?
Blitting is where the graphics processor can copy a rectangular area in memory (bitmap) and display it along with apply optional operations before displaying (such as XOR with existing bits).
Summary:
To improve your rendering speed, only convert images from the file into a bitmap form once. Store this somewhere. Refer to this converted bitmap as needed. Next, investigate and implement double buffering. Lastly, investigate and use bit-blitting or blitting.
Other optimization rules apply here too, such as reviewing the design, removing requirements, loop unrolling, passing images via pointer vs. copying them, and reduce "if" statements by using boolean logic and Karnaugh (sp?) maps.
In general, calculations for rendering vector graphics are going to take longer than blitting a rectangular region of a bitmap to the screen. But for basic UI stuff, neither should be particularly intensive.
You probably should do some profiling. Perhaps you're redrawing much more frequently than necessary. Or perhaps the PNG is being decoded each time you try to draw from it. (I'm not familiar with Juce.)
For a straight Windows app, I'd probably render vector graphics into a device-dependent bitmap once on startup and then just blit from the bitmap to the screen. Using vector gives you DPI independence, and blitting from a device-dependent bitmap is about the fastest way to paint a block of pixels. I believe the color matching is done when you render to the device-dependent bitmap, so you don't even have the ICM overhead on the screen drawing.
Vector graphics was ditched long ago - bitmap graphics are more performant. The thing is that you can send a bitmap to the GPU once and then render it forever more by a simple copy.
Secondly, the GPU uses it's own texture compression. DirectX is DXT5, I believe, but when the GPU sees the texture, it doesn't care what you loaded it from.
However, a modern CPU even with a crappy integrated GPU should have absolutely no problem with simple GUI rendering. If you're struggling, then it's time to look again at the technique you're using. Perhaps your framework is slow or your use of it is suboptimal.

GPU memory allocation for video

Is it possible to allocate some memory on the GPU without cuda?
i'm adding some more details...
i need to get the video frame decoded from VLC and have some compositing functions on the video; I'm doing so using the new SDL rendering capabilities.
All works fine until i have to send the decoded data to the sdl texture... that part of code is handled by standard malloc which is slow for video operations.
Right now i'm not even sure that using gpu video will actually help me
Let's be clear: are you are trying to accomplish real time video processing? Since your latest update changed the problem considerably, I'm adding another answer.
The "slowness" you are experiencing could be due to several reasons. In order get the "real-time" effect (in the perceptual sense), you must be able to process the frame and display it withing 33ms (approximately, for a 30fps video). This means you must decode the frame, run the compositing functions (as you call) on it, and display it on the screen within this time frame.
If the compositing functions are too CPU intensive, then you might consider writing a GPU program to speed up this task. But the first thing you should do is determine where the bottleneck of your application is exactly. You could strip your application momentarily to let it decode the frames and display them on the screen (do not execute the compositing functions), just to see how it goes. If its slow, then the decoding process could be using too much CPU/RAM resources (maybe a bug on your side?).
I have used FFMPEG and SDL for a similar project once and I was very happy with the result. This tutorial shows to do a basic video player using both libraries. Basically, it opens a video file, decodes the frames and renders them on a surface for displaying.
You can do this via Direct3D 11 Compute Shaders or OpenCL. These are similar in spirit to CUDA.
Yes, it is. You can allocate memory in the GPU through OpenGL textures.
Only indirectly through a graphics framework.
You can use OpenGL which is supported by virtually every computer.
You could use a vertex buffer to store your data. Vertex buffers are usually used to store points for rendering, but you can easily use it to store an array of any kind. Unlike textures, their capacity is only limited by the amount of graphics memory available.
http://www.songho.ca/opengl/gl_vbo.html has a good tutorial on how to read and write data to vertex buffers, you can ignore everything about drawing the vertex buffer.

Fast Updating of QPixmap from byte array

I'm working on a vision application and I need to have a "Live View" from the camera displayed on the screen using a QPixmap object. We will be updating the screen at 30frames/second on a continuous basis.
My problem is that this application has to run on some 3-5 year old computers that, by todays standards, are slow. So what I would like to do is to be able to directly write to the display byte array inside of QPixmap. After going through the program code, almost option for changing the contents of a Pixmap results in a new QPixmap being created. This is the overhead I'm trying to get ride of.
Additionally, I would like to prevent all the new/deletes from occurring just to keep memory fragmentation under control.
Any suggestions?
First of all, the most important piece of information regarding the "picture" classes in Qt:
QImage is designed and optimized for I/O, and for direct pixel access and manipulation, while QPixmap is designed and optimized for showing images on screen.
What this means is that QPixmap is a generic representation of your platform's native image format: Pixmap on Unix, HBITMAP on Windows, CGImageRef on the Mac. QImage is a "pixel array with operations" type of class.
I'm assuming the following:
You are reading raw camera frames in a specific pixel format
You really are having memory fragmentation issues (as opposed to emotionally having them)
My advice is to use QImage instead of QPixmap. Specifically, there is a constructor that accepts a raw byte array and uses it directly as the pixel buffer:
QImage::QImage(uchar *data, int width, int height, int bytesPerLine, Format format)
Having constructed a QImage, use a QPainter to draw it to a widget at the desired frequency. Be warned however that:
If you are reading raw camera frames, format conversion may still be necessary. Twice, in the worst case: Camera ➔ Qimage ➔ Platform Bitmap.
You cannot avoid memory allocation from the free store when using QPixmap and QImage: they are implicitly shared classes and necessarily allocate memory from the free store. (On the other hand, that means you should not new/delete them explicitly.)
Our team managed to display fullscreen compressed video smoothly on Atom-powered computers using only Qt (albeit at a lower framerate). If this does not solve your problem, however, I'd bypass Qt and use the native drawing API. If you absolutely need platform independence, then OpenGL or SDL may be good solutions.
I have found that QImages are faster for Direct I/O operations.
Could you provide more detail as to what you are getting and trying to do with the QPixmap?

How does Photoshop (Or drawing programs) blit?

I'm getting ready to make a drawing application in Windows. I'm just wondering, do drawing programs have a memory bitmap which they lock, then set each pixel, then blit?
I don't understand how Photoshop can move entire layers without lag or flicker without using hardware acceleration. Also in a program like Expression Design, I could have 200 shapes and move them around all at once with no lag. I'm really wondering how this can be done without GPU help.
Also, I don't think super efficient algorithms could justify that?
Look at this question:
Reduce flicker with GDI+ and C++
All you can do about DC drawing without GPU is to reduce flickering. Anything else depends on the speed of filling your memory bitmap. And here you can use efficient algorithms, multithreading and whatever you need.
Certainly modern Photoshop uses GPU acceleration if available. Another possible tool is DMA. You may also find it helpful to read the source code of existing programs like GIMP.
Double (or more) buffering is the way it's done in games, where we're drawing a ton of crap into a "back" buffer while the "front" buffer is being displayed. Then when the draw is done, the buffers are swapped (a pointer swap, not copies!) and the process continues in the new front and back buffers.
Triple buffering offers another bonus, in that you can start drawing two-frames-from-now when next-frame is done, but without forcing a buffer swap in the middle of the screen refresh. Many games do the buffer swap in the middle of the refresh, but you can sometimes see it as visible artifacts (tearing) on the screen.
Anyway- for an app drawing bitmaps into a window, if you've got some "slow" operation, do it into a not-displayed buffer while presenting the displayed version to the rendering API, e.g. GDI. Let the system software handle all of the fancy updating.