Too High CPU Footprint of OpenCV Text Overlay on FHD Video Stream - c++

I want to display a FHD live-stream (25 fps) and overlay some (changing) text. For this I essentially use the code below.
Basically it is
Load frame
(cv::putText skipped here)
Display frame if it's a multiple of delay
but the code is super super slow compared to e.g. mpv and consumes way to much cpu-time (cv::useOptimized() == true).
So far delay is my inconvenient fiddle-parameter to somehow make it feasible.
delay == 1 results in 180 % CPU usage (full frame-rate)
delay == 5 results in 80 % CPU usage
But delay == 5 or 5 fps is really sluggish and actually still too much cpu load.
How can I make this code faster or otherwise better or otherwise solve the task (I'm not bound to opencv)?
P.s. Without cv::imshow the CPU usage is less than 30 %, regardless of delay.
#include <opencv2/opencv.hpp>
#include <X11/Xlib.h>
// process ever delayth frame
#define delay 5
Display* disp = XOpenDisplay(NULL);
Screen* scrn = DefaultScreenOfDisplay(disp);
int screen_height = scrn->height;
int screen_width = scrn->width;
int main(int argc, char** argv){
cv::VideoCapture cap("rtsp://url");
cv::Mat frame;
if (cap.isOpened())
cap.read(frame);
cv::namedWindow( "PREVIEW", cv::WINDOW_NORMAL );
cv::resizeWindow( "PREVIEW", screen_width, screen_height );
int framecounter = 0;
while (true){
if (cap.isOpened()){
cap.read(frame);
framecounter += 1;
// Display only delay'th frame
if (framecounter % delay == 0){
/*
* cv::putText
*/
framecounter = 0;
cv::imshow("PREVIEW", frame);
}
}
cv::waitKey(1);
}
}

I now found out about valgrind (repository) and gprof2dot (pip3 install --user gprof2dot):
valgrind --tool=callgrind /path/to/my/binary # Produced file callgrind.out.157532
gprof2dot --format=callgrind --output=out.dot callgrind.out.157532
dot -Tpdf out.dot -o graph.pdf
That produced a wonderful graph saying that over 60 % evaporates on cvResize.
And indeed, when I comment out cv::resizeWindow, the cpu usage lowers from 180 % to ~ 60 %.
Since the screen has a resolution of 1920 x 1200 and the stream 1920 x 1080, it essentially did nothing but burning CPU cycles.
So far, this is still fragile. As soon as I switch it to full-screen mode and back, the cpu load goes back to 180 %.
To fix this, it turned out that I can either disable resizing completely with cv::WINDOW_AUTOSIZE ...
cv::namedWindow( "PREVIEW", cv::WINDOW_AUTOSIZE );
... or -- as Micka suggested -- on OpenCV versions compiled with OpenGL support (-DWITH_OPENGL=ON, my Debian repository version was not), use ...
cv::namedWindow( "PREVIEW", cv::WINDOW_OPENGL );
... to offload the rendering to the GPU, what turns out to be even faster together with resizing (55 % CPU compared to 65 % for me).
It just does not seem to work together with cv::WINDOW_KEEPRATIO.*
Furthermore, it turns out that cv:UMat can be used as a drop-in replacement for cv:Mat which additionally boosts the performance (as seen by ps -e -o pcpu,args):
Appendix
[*] So we have to manually scale it and take care of the aspect ratio.
float screen_aspratio = (float) screen_width / screen_height;
float image_aspratio = (float) image_width / image_height;
if ( image_aspratio >= screen_aspratio ) { // width limited, center window vertically
cv::resizeWindow("PREVIEW", screen_width, screen_width / image_aspratio );
cv::moveWindow( "PREVIEW", 0, (screen_height - image_height) / 2 );
}
else { // height limited, center window horizontally
cv::resizeWindow("PREVIEW", screen_height * image_aspratio, screen_height );
cv::moveWindow( "PREVIEW", (screen_width - image_width) / 2, 0 );
}

One thing that pops is you're creating a new window and resizing it every time you want to display something.
move these lines
cv::namedWindow( "PREVIEW", cv::WINDOW_NORMAL );
cv::resizeWindow( "PREVIEW", screen_width, screen_height );
to before your while(true) and see it that solves this

Related

The correct way to implement an animation loop in CImg

I am trying to use CImg to visualize my program. I don't want to use OpenGL because I am trying to write a small rendering engine(Just for my own interest, not an assignment!!!).
I want to create an animation loop in CImg.
This is my loop.
while (!disp.is_closed() && !disp.is_keyQ() && !disp.is_keyESC()) {
img.fill(0); // Set pixel values to 0 (color : black)
img.draw_text(t%WIDTH,30,"Hello World",purple); // Draw a purple "Hello world" at coordinates (100,100).
img.draw_text(10,10, ( "Time used for rendering :: " + std::to_string( 1.0 / ( duration.count()/cnt/1000000.0 ) ) ).c_str(), purple);
disp.render(img);
disp.paint();
if (t % (int)cnt == 0) {
tmp = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(tmp - start);
start = std::chrono::high_resolution_clock::now();
}
t++;
}
It works OK but the framerate is highly unstable(ranging from 100fps to 380 fps). Is this normal? Or, is this the right way to construct an animation loop in CImg?
I read the documentation of CImg and it said
Should not be used for common CImgDisplay uses, since display() is more useful.
But when I put it like this,
while (!disp.is_closed() && !disp.is_keyQ() && !disp.is_keyESC()) {
img.fill(0); // Set pixel values to 0 (color : black)
img.draw_text(t%WIDTH,30,"Hello World",purple); // Draw a purple "Hello world" at coordinates (100,100).
img.draw_text(10,10, ( "Time used for rendering :: " + std::to_string( 1.0 / ( duration.count()/cnt/1000000.0 ) ) ).c_str(), purple);
disp.display(img);
if (t % (int)cnt == 0) {
tmp = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(tmp - start);
start = std::chrono::high_resolution_clock::now();
}
t++;
}
This FPS drops to 11. So, what has gone wrong?
Thanks in advance.
~~~~~~~~~~~~~~~~~~~~~~ I am a line ~~~~~~~~~~~~~~~~~~~~~~~
Edit 1 :
//
// main.cpp
// Render Engine
//
// Created by Ip Daniel on 6/4/18.
// Copyright © 2018 Ip Daniel. All rights reserved.
//
#include "CImg.h"
#include <chrono>
#include <string>
#include <iostream>
using namespace cimg_library;
#define WIDTH 640
#define HEIGHT 480
time_t start_time;
time_t tmp;
double time_passed = 0;
int main() {
CImg<unsigned char> img(WIDTH,HEIGHT,1,3); // Define a 640x400 color image with 8 bits per color component.
CImgDisplay disp(img,"My Hello World Loop");
unsigned int t = 0;
double cnt = 30;
auto start = std::chrono::high_resolution_clock::now();
auto tmp = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(tmp - start);
unsigned char purple[] = { 255,0,255 }; // Define a purple color
while (!disp.is_closed() && !disp.is_keyQ() && !disp.is_keyESC()) {
img.fill(0); // Set pixel values to 0 (color : black)
img.draw_text(t%WIDTH,30,"Hello World",purple); // Draw a purple "Hello world" at coordinates (100,100).
img.draw_text(10,10, ( "Time used for rendering :: " + std::to_string( 1.0 / ( duration.count()/cnt/1000000.0 ) ) ).c_str(), purple);
// disp.display(img);
disp.render(img);
disp.paint();
if (t % (int)cnt == 0) {
tmp = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(tmp - start);
start = std::chrono::high_resolution_clock::now();
}
t++;
}
return 0;
}
The command to compile is
g++ -o main main.cpp -O2 -lm -lpthread -I/usr/X11R6/include -L/usr/X11R6/lib -lm -lpthread -lX11
Thank you for adding your code and environment. I have been experimenting with your code and I am not sure I really understand what the issue is - whether it is the variation in render times or something else. Anyway, it is a bit much for a comment so I have made an answer and folks can maybe then discuss it.
Basically, I don't really think it is that reasonable to expect a program that interacts with an X11 server, on a multi-user operating system to be that deterministic - and, secondly, I am also not sure it matters.
I changed your code a little to measure the time for 100 frames, and on my machine it varies between 20,000 microseconds and 150,000 microseconds with the average being fairly consistently around 80,000 microseconds. So that is an average of 80ms for 100 frames, or 0.8ms per frame, and a range of 0.2-1.5ms per frame.
For a smooth animation, you are going to want 25-30 fps which is 33-40ms per frame. So your render time is around 0.8ms incurred once every 30ms, or around 2-3% of your frame time, and only 5% worst case (1.5ms on a 30ms frame).
So, I think your code is more likely to have the frame it needs and be able to wait/sleep 24ms on average before needing to render it, so you will have plenty of time in hand and a 0.8-1.5ms variation in a 30ms frame rate will not be noticeable.

SDL image disappears after 15 seconds

I'm learning SDL and I have a frustrating problem. Code is below.
Even though there is a loop that keeps the program alive, when I load an image and change the x value of the source rect to animate, the image that was loaded disappears after exactly 15 seconds. This does not happen with static images. Only with animations. I'm sure there is a simple thing I'm missing but I cant see it.
void update(){
rect1.x = 62 * int ( (SDL_GetTicks() / 100) % 12);
/* 62 is the width of a frame, 12 is the number of frames */
}
void shark(){
surface = IMG_Load("s1.png");
if (surface != 0){
texture = SDL_CreateTextureFromSurface(renderer,surface);
SDL_FreeSurface(surface);
}
rect1.y = 0;
rect1.h = 90;
rect1.w = 60;
rect2.x = 0;
rect2.y = 0;
rect2.h = rect1.h+30; // enlarging the image
rect2.w = rect1.w+30;
SDL_RenderCopy(renderer,texture,&rect1,&rect2);
}
void render(){
SDL_SetRenderDrawColor(renderer, 0, 0, 100, 150);
SDL_RenderPresent(renderer);
SDL_RenderClear(renderer);
}
and in main
update();
shark();
render();
SDL_image header is included, linked, dll exists. Could be the dll is broken?
I left out rest of the program to keep it simple. If this is not enough, I can post the whole thing.
Every time you call the shark function, it loads another copy of the texture. With that in a loop like you have it, you will run out of video memory quickly (unless you are calling SDL_DestroyTexture after every frame, which you have not indicated). At which point, you will no longer be able to load textures. Apparently this takes about fifteen seconds for you.
If you're going to use the same image over and over, then just load it once, before your main loop.
This line int ( (SDL_GetTicks() / 100) % 12);
SDL_GetTicks() returns the number of miliseconds that have elapsed since the lib initialized (https://wiki.libsdl.org/SDL_GetTicks). So you're updating with the TOTAL AMOUNT OF TIME since your application started, not the time since last frame.
You're supposed to keep count of the last time and update the application with how much time has passed since the last update.
Uint32 currentTime=SDL_GetTicks();
int deltaTime = (int)( currentTime-lastTime );
lastTime=currentTime; //declared previously
update( deltaTime );
shark();
render();
Edit: Benjamin is right, the update line works fine.
Still using the deltaTime is a good advice. In a game, for instance, you won't use the total time since the beginning of the application, you'll probably need to keep your own counter of how much time has passed (since you start an animation).
But there's nothing wrong with that line for your program anyhow.

Is there a reasonable limit to how many images SDL can render? [duplicate]

I am programming a raycasting game using SDL2.
When drawing the floor, I need to call SDL_RenderCopy pixelwise. This leads to a bottleneck which drops the framerate below 10 fps.
I am looking for performance boosts but can't seem to find some.
Here's a rough overview of the performance drop:
int main() {
while(true) {
for(x=0; x<800; x++) {
for(y=0; y<600; y++) {
SDL_Rect src = { 0, 0, 1, 1 };
SDL_Rect dst = { x, y, 1, 1 };
SDL_RenderCopy(ren, tx, &src, &dst); // this drops the framerate below 10
}
}
SDL_RenderPresent(ren);
}
}
You should probably be using texture streaming for this. Basically you will create an SDL_Texture of type SDL_TEXTUREACCESS_STREAMING and then each frame you 'lock' the texture, update the pixels that you require then 'unlock' the texture again. The texture is then rendered in a single SDL_RenderCopy call.
LazyFoo Example -
http://lazyfoo.net/tutorials/SDL/42_texture_streaming/index.php
Exploring Galaxy -
http://slouken.blogspot.co.uk/2011/02/streaming-textures-with-sdl-13.html
Other than that calling SDL_RenderCopy 480,000 times a frame is always going to kill your framerate.
You are calling SDL_RenderCopy() in each frame so 600 * 800 = 480 000 times! It is normal for performance to drop.

High CPU usage with SDL + OpenGL

I have a modern CPU (AMD FX 4170) and a modern GPU (NVidia GTX 660). Yet this simple program manages to fully use one of my CPU's cores. This means it uses one 4.2 GHz core to draw nothing at 60 FPS. What is wrong with this program?
#include <SDL/SDL.h>
int main(int argc, char** argv)
{
SDL_Init(SDL_INIT_VIDEO | SDL_INIT_AUDIO);
SDL_SetVideoMode(800, 600, 0, SDL_OPENGL | SDL_RESIZABLE);
while(true)
{
Uint32 now = SDL_GetTicks();
SDL_GL_SwapBuffers();
int delay = 1000 / 60 - (SDL_GetTicks() - now);
if(delay > 0) SDL_Delay(delay);
}
return 0;
}
It turns out that NVidia's drivers' implement waiting for vsync with a busy loop which causes SDL_GL_SwapBuffers() to use 100 % CPU. Turning off vsync from NVidia Control Panel removes this problem.
Loops use as much computing power as they can. The main problem may be located in:
int delay = 1000 / 60 - (SDL_GetTicks() - now);
your delay duration may be less than zero so that your operation may be just an infinite loop without waiting. You need to control the value of variable delay.
Moreover, in the this link: it is proposed that
SDL_GL_SetAttribute(SDL_GL_SWAP_CONTROL,1); can be used to enable vsync so that it will not use all the CPU

How to keep the CPU usage down while running an SDL program?

I've done a very basic window with SDL and want to keep it running until I press the X on window.
#include "SDL.h"
const int SCREEN_WIDTH = 640;
const int SCREEN_HEIGHT = 480;
int main(int argc, char **argv)
{
SDL_Init( SDL_INIT_VIDEO );
SDL_Surface* screen = SDL_SetVideoMode( SCREEN_WIDTH, SCREEN_HEIGHT, 0,
SDL_HWSURFACE | SDL_DOUBLEBUF );
SDL_WM_SetCaption( "SDL Test", 0 );
SDL_Event event;
bool quit = false;
while (quit != false)
{
if (SDL_PollEvent(&event)) {
if (event.type == SDL_QUIT) {
quit = true;
}
}
SDL_Delay(80);
}
SDL_Quit();
return 0;
}
I tried adding SDL_Delay() at the end of the while-clause and it worked quite well.
However, 80 ms seemed to be the highest value I could use to keep the program running smoothly and even then the CPU usage is about 15-20%.
Is this the best way to do this and do I have to just live with the fact that it eats this much CPU already on this point?
I know this is an older post, but I myself just came across this issue with SDL when starting up a little demo project. Like user 'thebuzzsaw' noted, the best solution is to use SDL_WaitEvent to reduce the CPU usage of your event loop.
Here's how it would look in your example for anyone looking for a quick solution to it in the future. Hope it helps!
#include "SDL.h"
const int SCREEN_WIDTH = 640;
const int SCREEN_HEIGHT = 480;
int main(int argc, char **argv)
{
SDL_Init( SDL_INIT_VIDEO );
SDL_Surface* screen = SDL_SetVideoMode( SCREEN_WIDTH, SCREEN_HEIGHT, 0,
SDL_HWSURFACE | SDL_DOUBLEBUF );
SDL_WM_SetCaption( "SDL Test", 0 );
SDL_Event event;
bool quit = false;
while (quit == false)
{
if (SDL_WaitEvent(&event) != 0) {
switch (event.type) {
case SDL_QUIT:
quit = true;
break;
}
}
}
SDL_Quit();
return 0;
}
I would definitely experiment with fully blocking functions (such as SDL_WaitEvent). I have an OpenGL application in Qt, and I noticed the CPU usage hovers between 0% and 1%. It spikes to maybe 4% during "usage" (moving the camera and/or causing animations).
I am working on my own windowing toolkit. I have noticed I can achieve similar CPU usage when I use blocking event loops. This will complicate any timers you may depend on, but it is not terribly difficult to implement timers with this new approach.
I just figured out how to reduce CPU usage in my game from 50% down to < 10%.
Your program is much more simple and simply using SDL_Delay() should be enough.
What I did was:
Use SDL_DisplayFormat() when loading images, so the blitting would be faster. This brought its CPU usage down to about 30%.
So I found out that blitting the games background (big one-piece .png file) was eating the most out of my CPU. I searched the Internet for a solution, but all I found was the same answer - just use SDL_Delay(). Finally, I found out that the problem was embarrassingly simple - the SDL_DisplayFormat() was converting my 24-bit images to 32-bit. So I set my display BPP to 24, which brought CPU usage to ~20%. Bringing it down to 16 bit solved the problem for me and the CPU usage is under 10% now.
Of course this means loss of color detail, but as my game is a simplistic 2D game with not too detailed graphics, this was OK.
In order to really understand this, you need to understand threading. In a threaded application, the program runs until it is waiting for something, then it tells the OS that something else can run. In essence, you are doing this with the SDL_Delay command. If there was no delay at all, I suspect your program would be running at near 100% capacity.
The amount of time that you should put in the delay statement only matters if the other commands are taking a significant amount of time. In general, I would put the delay to be a similar amount of time that it takes to test the poll command, but not more than, say, 10 ms. What will happen is that the OS will wait at least that length of time, allowing other applications to run in the background.
As to what you can do to improve this, well, it looks like there isn't a whole lot that you can do. However, take note that if there was another process running taking a significant amount of CPU power, your program's share would decrease.