SDL : Hardware rendering vs Software rendering - sdl

A general question about a window with an extensive menu that is updated often , but only around 10% of the actual screen is updated. Much if the text remains unchanged.
SDL2 uses rendering and textures to utilize hardware acceleration , but is also allows software rendering.
My question is:
Is it faster to redraw the entire screen/menu each time by rendering directly to the hardware using SDL_CreateRender() / hardware-render where the entire menu is drawn each time when only 10% of the menu actually changes ? Or:
Is it faster to write the entire menu once into RAM using SDL_CreateSoftwareRenderer() / software-render and then update only the 10% that actually changes?

Thanks guys.
It is nice to be able to get other opinions before spending a lot of time on an issue. The simplest route was to use hardware rendering and redraw the entire window each time as it is fast enough.
The main program draws everything (real-time data and GUI under Linux ) as individual pixels to a texture which is rendered at a 30Hz rate to the screen. What I found was that once CPU clock rates exceeded 1Ghz , most of the graphics ( small areas / pixels) was as fast or faster than hardware rendering , except when clearing a large area of the screen..It took some work and a lot of reading to get the SDL rendering working in a thread while the pixel data (GUI and data ) is being updated in main.

Related

OSX pushing pixels to screen with minimum latency

I'm trying to develop some very low-latency graphics applications and am getting really frustrated by how long it takes to draw to screen through OpenGL. Every discussion I find about it online addresses optimizing the OpenGL pipeline, but doesn't get anywhere near the results that I need.
Check this out:
https://www.dropbox.com/s/dbz4bq67cxluhs7/MouseLatency.MOV?dl=0
You probably noticed this before: With a c++ OpenGL app, dragging the mouse around the screen, and drawing the mouse location in OpenGL, the OpenGL lags behind by 3 or 4 frames. Clearly OSX CAN draw [the cursor] to screen with very low latency, but OpenGL is much slower. So let's say I don't need to do any fancy OpenGL rendering. I just want to push pixels to screen somehow. Is there a way for me to bypass OpenGL completely and draw to screen faster? Or is this kind of functionality going to be locked inside the kernel somewhere that I can't reach it?
datenwolf's answer is excellent. I just wanted to add one thing to this discussion regarding triple buffering at the compositor level, since I am very familiar with the Microsoft Windows desktop compositor.
I know you are asking about OS X here, but the implementation details I am going to discuss are the most sensible way of implementing this stuff and I would expect to see other systems work this way too.
Triple buffering as you might enable at the application level adds a third buffer to the swap-chain that is synchronized to refresh. That way of doing triple buffering does add latency, because that third buffer has to be displayed and nothing is allowed to touch it until this happens (this is D3D's mandated behavior -- the behavior and feature itself are undefined in OpenGL); but the way the Desktop Window Manager (Windows) works is slightly different.
The behavior I have seen most drivers implement for desktop composition is frame dropping. Any situation where multiple frames are finished between refreshes, all but 1 of those frames are discarded. You actually get lower latency using a window rather than fullscreen + triple buffering, because it does not block buffer swaps when the third buffer (owned by the compositor) has a finished frame waiting to be displayed.
It creates a whole different set of visual issues if framerate is not reasonably consistent. Technically, pixels belonging to dropped frames have infinite latency, so the benefits from latency reduction done this way might be worthless if you needed every single frame drawn to appear on screen.
I believe you can get this behavior on OS X (if you want it) by disabling VSYNC and drawing in a window. VSYNC basically only serves as a form of frame pacing (trade latency for consistency) in this scenario and tearing is eliminated by the compositor itself regardless what rate you draw at.
Regarding mouse cursor latency:
The cursor in any modern window system will always track with minimum latency. There is literally a feature on graphics hardware called a "hardware cursor," where the driver stores the cursor position and then once per-refresh, has the hardware overlay the cursor on top of whatever is sitting in the framebuffer waiting to be scanned-out. So even if your application is drawing at 30 FPS on a 60 Hz display, the cursor is updated every 16 ms when the hardware cursor's used.
This bypasses all graphics APIs altogether, but is quite limited (e.g. it uses the OS-defined cursor).
TL;DR: Latency comes in many forms.
If your problem is input latency, then you can mitigate that by reducing the number of pre-rendered frames and avoiding triple buffering. I could not begin to tell you how to reduce the number of driver pre-rendered frames on OS X.
Minimize length of time before something shows up on screen
If your problem is the amount of time that passes between executions of your render loop, you would go the other way. Increase pre-rendered frames, draw in a window and disable VSYNC. You may run into a lot of frames that are drawn but never displayed in this scenario.
Minimize time spent blocking (increase FPS); some frames will never be displayed
Pre-rendered frames are a powerful little feature that you do not get control over at the OpenGL API level. It sets up how deeply the driver is allowed to pipeline everything and depending on the desired task you will trade different types of latency by fiddling with it. Many gamers swear by setting this value to 1 to minimize input latency at the cost of overall framerate "smoothness."
UPDATE:
Pre-rendered frames are one reason for your multi-frame delay. Fixing this in a cross-platform way is difficult (it's a driver setting), but if you have access to Fence Sync Objects you can produce the same behavior as forcing this to 1.
I can explain this in more detail if need be, the general idea is that you insert a fence sync after the buffer swap and then wait for it to be signaled before the first command in the next frame is allowed to begin. Performance may take a nose dive, but latency will be minimized since the CPU won't be rendering ahead of the GPU anymore.
There are a number of latencies at play here.
Input event → drawing state latency
In your typical interactive application you have a event loop that usually goes
collect user input
process user input
determine what's to be drawn
draw to the back buffer
swap back to front buffer
With the usual ways in which event–update–display loops are written there's almost no delay between step 5 of the previous and step 1 of the following iteration. which means that steps 2, 3, and 4 operate with data that lags about one frame period behind.
So this is the first source of latency.
Tripple buffering / composition latency
Many graphics pipelines enable triple buffering for smoother display update. Instead of keeping only a back and a front buffer around, there's also a third buffer inbetween. The average rate at which to these buffers is drawn is the display refresh period. The buffers themself are stepped at exactly the display refresh period. So this adds another frame period of latency.
If you're running on a system with a window compositor (which is the default by MacOS X) this adds effectively another buffer stage, so if you've got a double buffer mode it gives you triple buffer and if you had a triple buffer it'd give you a "quad" buffer (quotes here, because quad buffer is a term usually used with stereoscopic rendering).
What can you do about this:
Turn off composition
Windows through the DWM API and MacOS X allow to turn off composition or bypass the compositor.
Reducing input lag
Try to collect and integrate the user input as late as possible (use high resolution sleeps). If you've got only a very simple scene you can push the drawing quite close to the V-Sync deadline; in fact the NVidia OpenGL implementation has a vendor specific extension that allows to sleep until a specific amount of time before the next V-Sync.
If your scene is complex but is separable in parts that require low latency user input and stuff where it doesn't matter so much you can draw the higher latency stuff earlier and only at the very last moment integrate user input into it. Of course if the mouse is used to control the viewing direction, or even worse you're rendering for a VR head mounted display things are going to become difficult.

How can be Autocad so fast to pan and zoom?

I am developing an image viewer where graphics are rendered with antialiased mode. Images can be first edited using Autocad that generates DXF files.
The application is written by using Visual C++ and Direct2D.
Although I am able to load the image quite quickly, zoom and especially pan remain a problem for me if compared with the performance of Autocad for the same image (same number of shapes).
Following is the piece of code aimed to render graphics:
auto shapes = quadTree.get_visible_shapes();
shapes.sort_by_Zorder();
for each shape in shapes:
shape.draw();
After profiling I can say that more than the 90% of the computational time is spent in the loop aimed to draw the shapes.
Drawing only the visible shapes, thanks to the implementation of the Quadtree, has been a huge performance improvement; I also render the graphics in aliased mode while panning, but there is still a big difference with Autocad.
I am wondering if Autocad draws a bitmap representation of the image, even if I didn't try this approach yet so I cannot tell if there could be an effective improvement in speed.
Considering these hypothesis are there any ways to improve the action of pan and zoom?
In AutoCAD, there is a mechanism called Adaptive degradation which abort rendering when the FPS falls below a predefined value:
And there is also a lot of optimization. You can not compete with a big program like this.
There are few considerations when doing pan on 2D/3D scene, especially when redraw-world is expensive.
Off-screen canvas
Render your screen onto an off screen bitmap with slightly larger canvas (e.g. w+N * h+N), upon PAN you instantly put up the screen, and update the off-screen one in background. There are also many ways to further optimize on this direction.
EDIT: More details:
For example, the screen of your scene is 640x480, the scene itself is 1000x1000, you want to show the region (301, 301) ~ (940, 780). You would instead create an off-screen buffer with, say, 740x580 (ie. N=50) from (251,251) ~ (990, 830). so, if the PAN operation move less than 50 pixel, (e.g. PAN left 5 pixels) you already have such content to instantly render to screen.
Also, after PAN you may want to prepare the new off-screen buffer in background (or when idle) so that subsequent PAN can be performed instantly.
In case of PAN too far, you still have to wait for it, or reduce the quality of rendering for intermediate screens, and render full details only when PAN stopped - user won't notice details when moving anyway.
Limit update frequency
PAN operation is usually triggered by mouse (or gesture touch) which may comes at high volume of events. Instead of queue all the 20 mouse move events within that one second and spend 3 seconds redraw the world 20 times, you should limit the update frequency.

OpenGL 2D agents animation and scenario refresh

I'm trying to do a little game in 2D to learn how to do it and improve my programming skills. I programme the game using C++/C and OpenGL 3.0 with GLUT.
I so confused with some important concepts about animations and scenario refresh.
It's a good practice load all the textures only when the level begins ?
I choose a fps rate to 40 fps, should i redraw all the scenario and the agents in every frame or only the modifications ?
In an agent animation should i redraw all the entire agent or only the parts which changes from the past ?
If some part of the scene changes (one wall or something similar is destroyed) should i need to redraw all the entire scene or only the part which changes ?
Now my "game" works with a framerate of 40fps but the game has a flickering effect that looks really weird.
Yes, creating and deleting textures/buffers every frame is a huge waste.
It's almost always cheaper to just redraw the entire scene. GPUs are built to do this, it's very fast.
Reading the framebuffer from VRAM back to regular RAM and calculating the difference is going to be much slower, especially since OpenGL doesn't keep track of your "objects", it just takes a triangle at a time, rasterizes it, then forgets about it.
Depends on how you define the animation. If you're talking about sprite-like animation, where each frame is a separate image, then it's cheapest to just refer to the new texture and redraw.
If you've got a texture atlas, update the texture coordinates and redraw, and if you're using shaders (you pretty much have to if you want to be OpenGL 3.0), you might be able to get away with a uniform that offsets texture coordinates.
Yeah, as I said before, the hardware is built to clear the screen and redraw everything.
And for a framerate, you should be using the monitor's refresh rate to avoid vertical tearing. Pretty much all monitors now are 60Hz, so 60fps is the most common "target" framerate.
Choose either 30 or 60 fps as most modern monitors refresh in 60 Hz rate. So you have either 2 or 1 rendered frame per "monitor frame". This should reduce flickering effects. (I'm not 100% sure if you mean this with "flash effect".)
Regarding all other questions (which sound pretty much the same): In OpenGL rendering, redrawing everything is pretty common, as in most games almost the entire screen changes in every frame, for example if you're moving around. You could do a partial screen update, but it's very uncommon and more expensive on the CPU side, as you have to compute which parts to draw instead of just "draw everything".
Yes
2-4. Yes - Hopefully this help you understand why you must...
Imagine you have 2 pieces of paper. The first paper you draw a stick man standing still, and show that to somebody.
The second paper while the user is looking at that paper you draw the same thing again but this time you move the arm a little bit.
Now you show them the second paper, as they look at the second paper you clear the first paper and draw the man moving his arm a little bit more.
This is pretty much how it works and is the reason you must always render the whole image regardless if nothing has changed.

How to scale to resolution in SDL?

I'm writing a 2D platformer game using SDL with C++. However I have encountered a huge issue involving scaling to resolution. I want the the game to look nice in full HD so all the images for the game have been created so that the natural resolution of the game is 1920x1080. However I want the game to scale down to the correct resolution if someone is using a smaller resolution, or to scale larger if someone is using a larger resolution.
The problem is I haven't been able to find an efficient way to do this.I started by using the SDL_gfx library to pre-scale all images but this doesn't work as it creates a lot of off-by-one errors, where one pixel was being lost. And since my animations are contained in one image when the animation would play the animation would slightly move up or down each frame.
Then after some looking round I have tried using opengl to handle the scaling. Currently my program draws all the images to a SDL_Surface that is 1920x1080. It then converts this surface to a opengl texture, scales this texture to the screen resolution, then draws the texture. This works fine visually but the problem is that its not efficient at all. Currently I am getting a max fps of 18 :(
So my question is does anyone know of an efficient way to scale the SDL display to the screen resolution?
It's inefficient because OpenGL was not designed to work that way. Main performance problems with current design:
First problem: You're software rasterizing with SDL. Sorry, but no matter what you do with this configuration, that will be a bottleneck. At a resolution of 1920x1080, you have 2,073,600 pixels to color. Assuming it takes you 10 clock cycles to shade each 4-channel pixel, on a 2GHz processor you're running a maximum of 96.4 fps. That doesn't sound bad, except you probably can't shade pixels that fast, and you still haven't done AI, user input, game mechanics, sound, physics, and everything else, and you're probably drawing over some pixels at least once anyway. SDL_gfx may be quick, but for large resolutions, the CPU is just fundamentally overtasked.
Second problem: Each frame, you're copying data across the graphics bus to the GPU. This is the slowest thing you can possibly do graphics-wise. Image data is probably the worst of that, because there's typically so much of it. Basically, each frame you're telling the GPU to copy two million some pixels from RAM to VRAM. According to Wikipedia, you can expect, for 2,073,600 pixels at 4 bytes each, no more than 258.9 fps, which again doesn't sound bad until you remember everything else you need to do.
My recommendation: switch your application completely to OpenGL. This removes the need to render to a texture and copy to the screen--just render directly to the screen! Also, scaling is handled automatically by your view matrix (glOrtho/gluOrtho2D for 2D), so you don't have to care about the scaling issue at all--your viewport will just show everything at the same scale. This is the ideal solution to your problem.
Now, it comes with the one major drawback that you have to recode everything with OpenGL draw commands (which is work, but not too hard, especially in the long run). Short of that, you can try the following ideas to improve speed:
PBOs. Pixel buffer objects can be used to address problem two by making texture loading/copying asynchronous.
Multithread your rendering. Most CPUs have at least two cores and on newer chips two register states can be saved for a single core (Hyperthreading). You're essentially duplicating how the GPU solves the rendering problem (have a lot of threads going). I'm not sure how thread safe SDL_gfx is, but I bet that something could be worked out, especially if you're only working on different parts of the image at the same time.
Make sure you pay attention to what place your draw surface is in SDL. It should probably be SDL_SWSURFACE (because you're drawing on the CPU).
Remove VSync. This can improve performance, even if you're not running at 60Hz
Make sure you're drawing your original texture--DO NOT scale it up or down to a new one. Draw it at a different size, and let the rasterizer do the work!
Sporadically update: Only update half the image at a time. This will probably close to double your "framerate", and it's (usually) not noticeable.
Similarly, only update the changing parts of the image.
Hope this helps.

Constantly lag in opengl application

I'm getting some repeating lags in my opengl application.
I'm using the win32 api to create the window and I'm also creating a 2.2 context.
So the main loop of the program is very simple:
Clearing the color buffer
Drawing a triangle
Swapping the buffers.
The triangle is rotating, that's the way I can see the lag.
Also my frame time isn't smooth which may be the problem.
But I'm very very sure the delta time calculation is correct because I've tried plenty ways.
Do you think it could be a graphic driver problem?
Because a friend of mine run almost the exactly same program except I do less calculations + I'm using the standard opengl shader.
Also, His program use more CPU power than mine and the CPU % is smoother than mine.
I should also add:
On my laptop I get same lag every ~1 second, so I can see some kind of pattern.
There are many reasons for a jittery frame rate. Off the top of my head:
Not calling glFlush() at the end of each frame
other running software interfering
doing things in your code that certain graphics drivers don't like
bugs in graphics drivers
Using the standard windows time functions with their terrible resolution
Try these:
kill as many running programs as you can get away with. Use the process tab in the task manager (CTRL-SHIFT-ESC) for this.
bit by bit, reduce the amount of work your program is doing and see how that affects the frame rate and the smoothness of the display.
if you can, try enabling/disabling vertical sync (you may be able to do this in your graphic card's settings) to see if that helps
add some debug code to output the time taken to draw each frame, and see if there are anomalies in the numbers, e.g. every 20th frame taking an extra 20ms, or random frames taking 100ms.