Does OpenGL display image faster than OpenCV? - c++

I am using OpenCV to show image on the projector. But it seems the cv::imshow is not fast enough or maybe the data transfer is slow from my CPU to GPU then to projector, so I wonder if there is a faster way to display than OpenCV?
I considered OpenGL, since OpenGL directly uses GPU, the command may be faster than from CPU which is used by OpenCV. Correct me if I am wrong.

OpenCV already supports OpenGL for image output by itself. No need to write this yourself!
See the documentation:
Create the window first with namedWindow, where you can pass the WINDOW_OPENGL flag.
Then you can even use OpenGL buffers or GPU matrices as input to imshow (the data never leaves the GPU). But it will also use OpenGL to show regular matrix data.
Please note:
To enable OpenGL support, configure OpenCV using CMake with
WITH_OPENGL=ON . Currently OpenGL is supported only with WIN32, GTK
and Qt backends on Windows and Linux (MacOS and Android are not
supported). For GTK backend gtkglext-1.0 library is required.
Note that this is OpenCV 2.4.8 and this functionality has changed quite recently. I know there was OpenGL support in earlier versions in conjunction with the Qt backend, but I don't remember when it was introduced.
About the performance: It is a quite popular optimization in the CV community to output images using OpenGL, especially when outputting video sequences.

OpenGL is optimised for rendering images, so it's likely faster. It really depends if the OpenCV implementation uses any GPU acceleration AND if the bottleneck is on rendering side of things.
Have you tried GPU accelerated OpenCV? -
How big is the image you are displaying? How long does it take to display the image using cv::imshow now?

I know it's an old question, but I happened to have exactly the same problem. And from my observations I've concluded that the root of the problem is the projector's own latency, especially if one is using an older model.
How have I concluded it?
I displayed the same video sequence with cv::imshow() on the laptop monitor and on the projector. Then I waved my hand. It was obvious, that projector introduces significant latency.
To double-check, I've opended a webcam video, waved my hand in front of it and observed the difference on the monitor and on the projector. Webcam does no processing, no opencv operations, so in my understanding the only thing that would explain the latency would be the projector itself.


QOpenGLWidget video rendering perfomance in multiple processes

My problem may seem vague without code, but it actually isn't.
So, there I've got an almost properly-working widget, which renders video frames.
Qt 5.10 and QOpenGLWidget subclassing worked fine, I didn't make any sophisticated optimizations -- there are two textures and a couple of shaders, converting YUV pixel format to RGB -- glTexImage2D() + shaders, no buffers.
Video frames are obtained from FFMPEG, it shows great performance due to hardware acceleration... when there is only one video window.
The piece of software is a "video wall" -- multiple independent video windows on the same screen. Of course, multi-threading would be the preferred solution, but legacy holds for now, I can't change it.
So, 1 window with Full HD video consumes ~2% CPU & 8-10% GPU regardless of the size of the window. But 7-10 similar windows, launched from the same executable at the same time consume almost all the CPU. My math says that 2 x 8 != 100...
My best guesses are:
This is a ffmpeg decoder issue, hardware acceleration still is not magic, some hardware pipeline stalls
7-8-9 independent OpenGL contexts cost a lot more than 1 cost x N
I'm not using PUBO or some other complex techniques to improve OpenGL rendering. It still explains nothing, but at least it is a guess
The behavior is the same on Ubuntu, where decoding uses different codec (I mean that using GPU accelerated or CPU accelerated codecs makes no difference!), so, it makes more probable that I'm missing something about OpenGL... or not, because launching 6-7 Qt examples with dynamic textures shows normal growth of CPU usages -- it is approximately a sum for the number of windows.
Anyway, it becomes quite tricky for me to profile the case, so I hope someone could have been solving the similar problem before and could share his experience with me. I'd be appreciated for any ideas, how to deal with the described riddle.
I can add any pieces of code if that helps.

Encode OpenGL rendered video without leaving the GPU memory

I am doing some preliminary work to make a rendering pipeline and I am investigating whether OpenGL is a good option for my use case: from a markup language I need to generate a video, ideally using opengl which already implements most of the primitives I need.
Is there a way to, instead of (or additionally to) updating a framebuffer, to make an mp4 video file using nvenc, without copying data back and forth between the GPU's and main memory?
The nvenc SDK page[1] on the NVidia website suggests that it can, as the current header graphic is of a game being streamed. (Even if it's a Direct3D game, same chip underneath.) A quick search for "nvenc share buffer with OpenGL" turned up a number of people apparently combining the two.
Runs on Linux and MS Windows only, so no joy if you have a Mac.
Hope this helps.

Drawing a videostream using wxWidgets

I have a relatively simple application which currently utilizes OpenCV to grab an image from a camera using cv::VideoCapture and view the resulting image in a window using imshow() running on OS X El Capitan.
In between I'm doing some basic image modification but this is not crucial to my problem.
Since the GUI implemented by OpenCV is pretty basic I decided to redo it using wxWidgets. I got it basically running similarly to the implementation linked in the tutorial section of wxWidgets. (Updated it to C++11 etc. but the idea is pretty much identical. Code is located on github.)
Now heres my problem: In best cases I get half the framerate as I get with the OpenCV only solution. OpenCV uses qt underneath. But when I look into the stack trace it comes down to similar function calls using CoreGraphics.
So my question boils down to: What is the best way do draw an image to a window with a framerate > 20fps using wxWidgets on OS X? Currently I use the DrawBitmap() function.
Bonus question: when I have the window on my Macbooks internal Retina screen the framerate gets even worse. Is there maybe any preprocessing/scaling on the picture I should do to take off load from the GUI-process?
The fastest is probably to use OpenGL (although I'm less sure about this under OS X which is not very OpenGL-friendly AFAIK), but I'm not really sure if the bottleneck is really DrawBitmap(), it could be the code doing the conversion to wxBitmap in the first place: if you don't use raw bitmap access, it could be quite slow.

What are the actual SDL2 hardware requirements?

I just can't find them anywhere. The most important part for me is the hardware acceleration, and I have no idea if there is a performance or openGL version compatibility requirement that the video card has to follow.
The minimum system requirements will depend alot more on the application that you are writing than what SDL2 does.
If you just create a standard window and render SDL will use what it can find and what it thinks is best either OpenGL, OpenGL ES, Direct3D or use the old style software rendering for machines that can't do any of the other. So if a computer can support an OS that SDL runs on then you will almost always (I just said almost since there can possible be exceptions) be able to run these type of apps (Video card not a requirement, but having one will greatly increase programs drawing speed).
You can also be creating a OpenGL application directly and then it depends on what type of context you are making what the video card has to support.
You can find most of the information here:
under the Video section. It's actually how to port from 1.2 to 2.0 , but it explains the new Video Pipeline pretty well.
Hope thats what you were looking for.

GPU memory allocation for video

Is it possible to allocate some memory on the GPU without cuda?
i'm adding some more details...
i need to get the video frame decoded from VLC and have some compositing functions on the video; I'm doing so using the new SDL rendering capabilities.
All works fine until i have to send the decoded data to the sdl texture... that part of code is handled by standard malloc which is slow for video operations.
Right now i'm not even sure that using gpu video will actually help me
Let's be clear: are you are trying to accomplish real time video processing? Since your latest update changed the problem considerably, I'm adding another answer.
The "slowness" you are experiencing could be due to several reasons. In order get the "real-time" effect (in the perceptual sense), you must be able to process the frame and display it withing 33ms (approximately, for a 30fps video). This means you must decode the frame, run the compositing functions (as you call) on it, and display it on the screen within this time frame.
If the compositing functions are too CPU intensive, then you might consider writing a GPU program to speed up this task. But the first thing you should do is determine where the bottleneck of your application is exactly. You could strip your application momentarily to let it decode the frames and display them on the screen (do not execute the compositing functions), just to see how it goes. If its slow, then the decoding process could be using too much CPU/RAM resources (maybe a bug on your side?).
I have used FFMPEG and SDL for a similar project once and I was very happy with the result. This tutorial shows to do a basic video player using both libraries. Basically, it opens a video file, decodes the frames and renders them on a surface for displaying.
You can do this via Direct3D 11 Compute Shaders or OpenCL. These are similar in spirit to CUDA.
Yes, it is. You can allocate memory in the GPU through OpenGL textures.
Only indirectly through a graphics framework.
You can use OpenGL which is supported by virtually every computer.
You could use a vertex buffer to store your data. Vertex buffers are usually used to store points for rendering, but you can easily use it to store an array of any kind. Unlike textures, their capacity is only limited by the amount of graphics memory available. has a good tutorial on how to read and write data to vertex buffers, you can ignore everything about drawing the vertex buffer.