High CPU usage with SDL + OpenGL - c++

I have a modern CPU (AMD FX 4170) and a modern GPU (NVidia GTX 660). Yet this simple program manages to fully use one of my CPU's cores. This means it uses one 4.2 GHz core to draw nothing at 60 FPS. What is wrong with this program?
#include <SDL/SDL.h>
int main(int argc, char** argv)
{
SDL_Init(SDL_INIT_VIDEO | SDL_INIT_AUDIO);
SDL_SetVideoMode(800, 600, 0, SDL_OPENGL | SDL_RESIZABLE);
while(true)
{
Uint32 now = SDL_GetTicks();
SDL_GL_SwapBuffers();
int delay = 1000 / 60 - (SDL_GetTicks() - now);
if(delay > 0) SDL_Delay(delay);
}
return 0;
}

It turns out that NVidia's drivers' implement waiting for vsync with a busy loop which causes SDL_GL_SwapBuffers() to use 100 % CPU. Turning off vsync from NVidia Control Panel removes this problem.

Loops use as much computing power as they can. The main problem may be located in:
int delay = 1000 / 60 - (SDL_GetTicks() - now);
your delay duration may be less than zero so that your operation may be just an infinite loop without waiting. You need to control the value of variable delay.
Moreover, in the this link: it is proposed that
SDL_GL_SetAttribute(SDL_GL_SWAP_CONTROL,1); can be used to enable vsync so that it will not use all the CPU

Related

Too High CPU Footprint of OpenCV Text Overlay on FHD Video Stream

I want to display a FHD live-stream (25 fps) and overlay some (changing) text. For this I essentially use the code below.
Basically it is
Load frame
(cv::putText skipped here)
Display frame if it's a multiple of delay
but the code is super super slow compared to e.g. mpv and consumes way to much cpu-time (cv::useOptimized() == true).
So far delay is my inconvenient fiddle-parameter to somehow make it feasible.
delay == 1 results in 180 % CPU usage (full frame-rate)
delay == 5 results in 80 % CPU usage
But delay == 5 or 5 fps is really sluggish and actually still too much cpu load.
How can I make this code faster or otherwise better or otherwise solve the task (I'm not bound to opencv)?
P.s. Without cv::imshow the CPU usage is less than 30 %, regardless of delay.
#include <opencv2/opencv.hpp>
#include <X11/Xlib.h>
// process ever delayth frame
#define delay 5
Display* disp = XOpenDisplay(NULL);
Screen* scrn = DefaultScreenOfDisplay(disp);
int screen_height = scrn->height;
int screen_width = scrn->width;
int main(int argc, char** argv){
cv::VideoCapture cap("rtsp://url");
cv::Mat frame;
if (cap.isOpened())
cap.read(frame);
cv::namedWindow( "PREVIEW", cv::WINDOW_NORMAL );
cv::resizeWindow( "PREVIEW", screen_width, screen_height );
int framecounter = 0;
while (true){
if (cap.isOpened()){
cap.read(frame);
framecounter += 1;
// Display only delay'th frame
if (framecounter % delay == 0){
/*
* cv::putText
*/
framecounter = 0;
cv::imshow("PREVIEW", frame);
}
}
cv::waitKey(1);
}
}
I now found out about valgrind (repository) and gprof2dot (pip3 install --user gprof2dot):
valgrind --tool=callgrind /path/to/my/binary # Produced file callgrind.out.157532
gprof2dot --format=callgrind --output=out.dot callgrind.out.157532
dot -Tpdf out.dot -o graph.pdf
That produced a wonderful graph saying that over 60 % evaporates on cvResize.
And indeed, when I comment out cv::resizeWindow, the cpu usage lowers from 180 % to ~ 60 %.
Since the screen has a resolution of 1920 x 1200 and the stream 1920 x 1080, it essentially did nothing but burning CPU cycles.
So far, this is still fragile. As soon as I switch it to full-screen mode and back, the cpu load goes back to 180 %.
To fix this, it turned out that I can either disable resizing completely with cv::WINDOW_AUTOSIZE ...
cv::namedWindow( "PREVIEW", cv::WINDOW_AUTOSIZE );
... or -- as Micka suggested -- on OpenCV versions compiled with OpenGL support (-DWITH_OPENGL=ON, my Debian repository version was not), use ...
cv::namedWindow( "PREVIEW", cv::WINDOW_OPENGL );
... to offload the rendering to the GPU, what turns out to be even faster together with resizing (55 % CPU compared to 65 % for me).
It just does not seem to work together with cv::WINDOW_KEEPRATIO.*
Furthermore, it turns out that cv:UMat can be used as a drop-in replacement for cv:Mat which additionally boosts the performance (as seen by ps -e -o pcpu,args):
Appendix
[*] So we have to manually scale it and take care of the aspect ratio.
float screen_aspratio = (float) screen_width / screen_height;
float image_aspratio = (float) image_width / image_height;
if ( image_aspratio >= screen_aspratio ) { // width limited, center window vertically
cv::resizeWindow("PREVIEW", screen_width, screen_width / image_aspratio );
cv::moveWindow( "PREVIEW", 0, (screen_height - image_height) / 2 );
}
else { // height limited, center window horizontally
cv::resizeWindow("PREVIEW", screen_height * image_aspratio, screen_height );
cv::moveWindow( "PREVIEW", (screen_width - image_width) / 2, 0 );
}
One thing that pops is you're creating a new window and resizing it every time you want to display something.
move these lines
cv::namedWindow( "PREVIEW", cv::WINDOW_NORMAL );
cv::resizeWindow( "PREVIEW", screen_width, screen_height );
to before your while(true) and see it that solves this

Limiting FPS in C++

I'm currently making a game in which I would like to limit the frames per second but I'm having problems with that. Here's what I'm doing:
I'm getting the deltaTime through this method that is executed each frame:
void Time::calc_deltaTime() {
double currentFrame = glfwGetTime();
deltaTime = currentFrame - lastFrame;
lastFrame = currentFrame;
}
deltaTime is having the value I would expect (around 0.012.... to 0.016...)
And than I'm using deltaTime to delay the frame through the Sleep windows function like this:
void Time::limitToMAXFPS() {
if(1.0 / MAXFPS > deltaTime)
Sleep((1.0 / MAXFPS - deltaTime) * 1000.0);
}
MAXFPS is equal to 60 and I'm multiplying by 1000 to convert seconds to milliseconds. Though everything seems correct I'm sill having more than 60 fps (I'm getting around 72 fps)
I also tried this method using while loop:
void Time::limitToMAXFPS() {
double diff = 1.0 / MAXFPS - deltaTime;
if(diff > 0) {
double t = glfwGetTime( );
while(glfwGetTime( ) - t < diff) { }
}
}
But still I'm getting more than 60 fps, I'm still getting around 72 fps... Am I doing something wrong or is there a better way for doing this?
How important is it that you return cycles back to the CPU? To me, it seems like a bad idea to use sleep at all. Someone please correct me if I am wrong, but I think sleep functions should be avoided.
Why not simply use an infinite loop that executes if more than a certain time interval has passed. Try:
const double maxFPS = 60.0;
const double maxPeriod = 1.0 / maxFPS;
// approx ~ 16.666 ms
bool running = true;
double lastTime = 0.0;
while( running ) {
double time = glfwGetTime();
double deltaTime = time - lastTime;
if( deltaTime >= maxPeriod ) {
lastTime = time;
// code here gets called with max FPS
}
}
Last time that I used GLFW, it seemed to self-limit to 60 fps anyway. If you are doing anything high performance orientated (game or 3D graphics), avoid anything that sleeps, unless you wanna use multithreading.
Sleep can be very inaccurate. A common phenomenon seen is that the actual time slept has a resolution of 14-15 milliseconds, which gives you a frame rate of ~70.
Is Sleep() inaccurate?
I've given up of trying to limit the fps like this... As you said Windows is very inconsistent with Sleep. My fps average is being always 64 fps and not 60. The problem is that Sleep takes as argument an integer (or long integer) so I was casting it with static_cast. But I need to pass to it as a double. 16 milliseconds each frame is different from 16.6666... That's probably the cause of this extra 4 fps (so I think).
I also tried :
std::this_thread::sleep_for(std::chrono::milliseconds(static_cast<long>(1.0 / MAXFPS - deltaTime) * 1000.0)));
and the same thing is happening with sleep_for. Then I tried passing the decimal value remaining from the milliseconds to chrono::microseconds and chrono::nanoseconds using them 3 together to get a better precision but guess what I still get the freaking 64 fps.
Another weird thing is in the expression (1.0 / MAXFPS - deltaTime) * 1000.0) sometimes (Yes, this is completely random) when I change 1000.0 to a const integer making the expression become (1.0 / MAXFPS - deltaTime) * 1000) my fps simply jumps to 74 for some reason, while the expression is completely equal to each other and nothing should happen. Both of them are double expressions I don't think is happening any type promotion here.
So I decided to force the V-sync through the function wglSwapIntervalEXT(1); in order to avoid screen tearing. And then I'm gonna use that method of multiplying deltaTime with every value that might very depending on the speed of the computer executing my game. It's gonna be a pain because I might forget to multiply some value and not noticing it on my own computer creating inconsistency, but I see no other way... Thank you all for the help though.
I've recently started using glfw for a small side project I'm working on, and I've use std::chrono along side std::this_thread::sleep_until to achieve 60fps
auto start = std::chrono::steady_clock::now();
while(!glfwWindowShouldClose(window))
{
++frames;
auto now = std::chrono::steady_clock::now();
auto diff = now - start;
auto end = now + std::chrono::milliseconds(16);
if(diff >= std::chrono::seconds(1))
{
start = now;
std::cout << "FPS: " << frames << std::endl;
frames = 0;
}
glfwPollEvents();
processTransition(countit);
render.TickTok();
render.RenderBackground();
render.RenderCovers(countit);
std::this_thread::sleep_until(end);
glfwSwapBuffers(window);
}
to add you can easily adjust FPS preference by adjusting end.
now with that said, I know glfw was limited to 60fps but I had to disable the limit with glfwSwapInterval(0); just before the while loop.
Are you sure your Sleep function accept floating point values. If it only accepts int, your sleep will be a Sleep(0) which will explain your issue.

Simple C++ SFML program high CPU usage

I'm currently working on a platformer and trying to implement a timestep, but for framerate limits greater than 60 the CPU usage goes up from 1% to 25% and more.
I made this minimal program to demonstrate the issue. There are two comments (lines 10-13, lines 26-30) in the code that describe the problem and what I have tested.
Note that the FPS stuff is not relevant to the problem (I think).
I tried to keep the code short and simple:
#include <memory>
#include <sstream>
#include <iomanip>
#include <SFML\Graphics.hpp>
int main() {
// Window
std::shared_ptr<sf::RenderWindow> window;
window = std::make_shared<sf::RenderWindow>(sf::VideoMode(640, 480, 32), "Test", sf::Style::Close);
/*
When I use the setFramerateLimit() function below, the CPU usage is only 1% instead of 25%+
(And only if I set the limit to 60 or less. For example 120 increases CPU usage to 25%+ again.)
*/
//window->setFramerateLimit(60);
// FPS text
sf::Font font;
font.loadFromFile("font.ttf");
sf::Text fpsText("", font, 30);
fpsText.setColor(sf::Color(0, 0, 0));
// FPS
float fps;
sf::Clock fpsTimer;
sf::Time fpsElapsedTime;
/*
When I set framerateLimit to 60 (or anything less than 60)
instead of 120, CPU usage goes down to 1%.
When the limit is greater, in this case 120, CPU usage is 25%+
*/
unsigned int framerateLimit = 120;
sf::Time fpsStep = sf::milliseconds(1000 / framerateLimit);
sf::Time fpsSleep;
fpsTimer.restart();
while (window->isOpen()) {
// Update timer
fpsElapsedTime = fpsTimer.restart();
fps = 1000.0f / fpsElapsedTime.asMilliseconds();
// Update FPS text
std::stringstream ss;
ss << "FPS: " << std::fixed << std::setprecision(0) << fps;
fpsText.setString(ss.str());
// Get events
sf::Event evt;
while (window->pollEvent(evt)) {
switch (evt.type) {
case sf::Event::Closed:
window->close();
break;
default:
break;
}
}
// Draw
window->clear(sf::Color(255, 255, 255));
window->draw(fpsText);
window->display();
// Sleep
fpsSleep = fpsStep - fpsTimer.getElapsedTime();
if (fpsSleep.asMilliseconds() > 0) {
sf::sleep(fpsSleep);
}
}
return 0;
}
I don't want to use SFML's setFramerateLimit(), but my own implementation with the sleep because I will use the fps data to update my physics and stuff.
Is there a logic error in my code? I fail to see it, given it works with a framerate limit of for example 60 (or less). Is it because I have a 60 Hz monitor?
PS: Using SFML's window->setVerticalSync() doesn't change the results
I answered another similar question with this answer.
The thing is, it's not exactly helping you with CPU usage, but I tried your code and it is working fine under 1% cpu usage at 120 FPS (and much more). When you make a game or an interactive media with a "game-loop", you don't want to lose performance by sleeping, you want to use as much cpu time as the computer can give you. Instead of sleeping, you can process other data, like loading stuff, pathfinding algorithm, etc., or just don't put limits on rendering.
I provide some useful links and code, here it is:
Similar question: Movement Without Framerate Limit C++ SFML.
What you really need is fixed time step. Take a look at the SFML Game
development book source code. Here's the interesting snippet from
Application.cpp:
const sf::Time Game::TimePerFrame = sf::seconds(1.f/60.f);
// ...
sf::Clock clock;
sf::Time timeSinceLastUpdate = sf::Time::Zero;
while (mWindow.isOpen())
{
sf::Time elapsedTime = clock.restart();
timeSinceLastUpdate += elapsedTime;
while (timeSinceLastUpdate > TimePerFrame)
{
timeSinceLastUpdate -= TimePerFrame;
processEvents();
update(TimePerFrame);
}
updateStatistics(elapsedTime);
render();
}
If this is not really what you want, see "Fix your timestep!"
which Laurent Gomila himself linked in the SFML forum.
I suggest to use the setFrameRate limit, because it's natively implemented in SFML and will work a lot better.
For getting the elapsed time you must do :
fpsElapsedTime = fpsTimer.getElapsedTime();
If I had to implement something similar, I would do:
/* in the main loop */
fpsElapsedTime = fpsTimer.getElapsedTime();
if(fpsElapsedTime.asMillisecond() >= (1000/framerateLimit))
{
fpsTimer.restart();
// All your content
}
Other thing, use sf::Color::White or sf::Color::Black instead of (sf::Color(255,255,255))
Hope this help :)

How to keep the CPU usage down while running an SDL program?

I've done a very basic window with SDL and want to keep it running until I press the X on window.
#include "SDL.h"
const int SCREEN_WIDTH = 640;
const int SCREEN_HEIGHT = 480;
int main(int argc, char **argv)
{
SDL_Init( SDL_INIT_VIDEO );
SDL_Surface* screen = SDL_SetVideoMode( SCREEN_WIDTH, SCREEN_HEIGHT, 0,
SDL_HWSURFACE | SDL_DOUBLEBUF );
SDL_WM_SetCaption( "SDL Test", 0 );
SDL_Event event;
bool quit = false;
while (quit != false)
{
if (SDL_PollEvent(&event)) {
if (event.type == SDL_QUIT) {
quit = true;
}
}
SDL_Delay(80);
}
SDL_Quit();
return 0;
}
I tried adding SDL_Delay() at the end of the while-clause and it worked quite well.
However, 80 ms seemed to be the highest value I could use to keep the program running smoothly and even then the CPU usage is about 15-20%.
Is this the best way to do this and do I have to just live with the fact that it eats this much CPU already on this point?
I know this is an older post, but I myself just came across this issue with SDL when starting up a little demo project. Like user 'thebuzzsaw' noted, the best solution is to use SDL_WaitEvent to reduce the CPU usage of your event loop.
Here's how it would look in your example for anyone looking for a quick solution to it in the future. Hope it helps!
#include "SDL.h"
const int SCREEN_WIDTH = 640;
const int SCREEN_HEIGHT = 480;
int main(int argc, char **argv)
{
SDL_Init( SDL_INIT_VIDEO );
SDL_Surface* screen = SDL_SetVideoMode( SCREEN_WIDTH, SCREEN_HEIGHT, 0,
SDL_HWSURFACE | SDL_DOUBLEBUF );
SDL_WM_SetCaption( "SDL Test", 0 );
SDL_Event event;
bool quit = false;
while (quit == false)
{
if (SDL_WaitEvent(&event) != 0) {
switch (event.type) {
case SDL_QUIT:
quit = true;
break;
}
}
}
SDL_Quit();
return 0;
}
I would definitely experiment with fully blocking functions (such as SDL_WaitEvent). I have an OpenGL application in Qt, and I noticed the CPU usage hovers between 0% and 1%. It spikes to maybe 4% during "usage" (moving the camera and/or causing animations).
I am working on my own windowing toolkit. I have noticed I can achieve similar CPU usage when I use blocking event loops. This will complicate any timers you may depend on, but it is not terribly difficult to implement timers with this new approach.
I just figured out how to reduce CPU usage in my game from 50% down to < 10%.
Your program is much more simple and simply using SDL_Delay() should be enough.
What I did was:
Use SDL_DisplayFormat() when loading images, so the blitting would be faster. This brought its CPU usage down to about 30%.
So I found out that blitting the games background (big one-piece .png file) was eating the most out of my CPU. I searched the Internet for a solution, but all I found was the same answer - just use SDL_Delay(). Finally, I found out that the problem was embarrassingly simple - the SDL_DisplayFormat() was converting my 24-bit images to 32-bit. So I set my display BPP to 24, which brought CPU usage to ~20%. Bringing it down to 16 bit solved the problem for me and the CPU usage is under 10% now.
Of course this means loss of color detail, but as my game is a simplistic 2D game with not too detailed graphics, this was OK.
In order to really understand this, you need to understand threading. In a threaded application, the program runs until it is waiting for something, then it tells the OS that something else can run. In essence, you are doing this with the SDL_Delay command. If there was no delay at all, I suspect your program would be running at near 100% capacity.
The amount of time that you should put in the delay statement only matters if the other commands are taking a significant amount of time. In general, I would put the delay to be a similar amount of time that it takes to test the poll command, but not more than, say, 10 ms. What will happen is that the OS will wait at least that length of time, allowing other applications to run in the background.
As to what you can do to improve this, well, it looks like there isn't a whole lot that you can do. However, take note that if there was another process running taking a significant amount of CPU power, your program's share would decrease.

Illegal Instruction When Programming C++ on Linux

My program, which does exactly the same thing every time it runs (moves a point sprite into the distance) will randomly fail with the text on the terminal 'Illegal Instruction'. My googling has found people encountering this when writing assembly which makes sense because assembly throws those kinds of errors.
But why would g++ be generating an illegal instruction like this? It's not like I'm compiling for Windows then running on Linux (which even then, as long as both are on x86 shouldn't AFAIK cause an Illegal Instruction). I'll post the main file below.
I can't reliably reproduce the error. Although, if I make random changes (add a space here, change a constant there) that force a recompile I can get a binary which will fail with Illegal Instruction every time it is run, until I try setting a break point, which makes the illegal instruction 'dissapear'. :(
#include <stdio.h>
#include <stdlib.h>
#include <GL/gl.h>
#include <GL/glu.h>
#include <SDL/SDL.h>
#include "Screen.h" //Simple SDL wrapper
#include "Textures.h" //Simple OpenGL texture wrapper
#include "PointSprites.h" //Simple point sprites wrapper
double counter = 0;
/* Here goes our drawing code */
int drawGLScene()
{
/* These are to calculate our fps */
static GLint T0 = 0;
static GLint Frames = 0;
/* Move Left 1.5 Units And Into The Screen 6.0 */
glLoadIdentity();
glTranslatef(0.0f, 0.0f, -6);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT | GL_STENCIL_BUFFER_BIT);
glEnable(GL_POINT_SPRITE_ARB);
glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE);
glBegin( GL_POINTS ); /* Drawing Using Triangles */
glVertex3d(0.0,0.0, 0);
glVertex3d(1.0,0.0, 0);
glVertex3d(1.0,1.0, counter);
glVertex3d(0.0,1.0, 0);
glEnd( ); /* Finished Drawing The Triangle */
/* Move Right 3 Units */
/* Draw it to the screen */
SDL_GL_SwapBuffers( );
/* Gather our frames per second */
Frames++;
{
GLint t = SDL_GetTicks();
if (t - T0 >= 50) {
GLfloat seconds = (t - T0) / 1000.0;
GLfloat fps = Frames / seconds;
printf("%d frames in %g seconds = %g FPS\n", Frames, seconds, fps);
T0 = t;
Frames = 0;
counter -= .1;
}
}
return 1;
}
GLuint objectID;
int main( int argc, char **argv )
{
Screen screen;
screen.init();
screen.resize(800,600);
LoadBMP("./dist/Debug/GNU-Linux-x86/particle.bmp");
InitPointSprites();
while(true){drawGLScene();}
}
The compiler isn't generating illegal exceptions, with 99.99% probability. Almost certainly what's happening is that you have a bug in your program which is causing it to either a) overwrite parts of your executable code with garbage data, or b) use a function pointer that points into garbage. Try running your program under valgrind to diagnose the problem - http://valgrind.org/.
The Illegal Instruction bug can also be a symptom of a faulty graphics card driver, or one that's mismatched to the hardware. Use lspci | grep VGA to confirm what your hardware actually is. Then try downloading the latest & greatest driver for your hardware model.
There is also a known bug when running code from inside NetBeans 6.8 on a multi-core 64-bit machine. The code crashes stochastically with Illegal Instruction based on race conditions in the profiler. Per cent of crashes varies from 1% or 5% for some code, 30% or 50%, up to around 95%+, depending on which libraries are being loaded. Graphics and threads code seems to increase this, but you can see it with a trivial Hello World main. If you get a 1% crash rate, you probably haven't noticed it before. Solution: run the executable straight from a terminal, if you can.
Most probably your drivers are Mesa Software Rendering or a faulty Graphic Card Driver. Mesa uses sometimes special set of instructions like AVX, AVX2, among others.
Your glEnable(GL_POINT_SPRITE_ARB); code may be activating such a part from Mesa which is not intended for your CPU.
I know the post is old, but might help others in the future.