optimal pixel-read back strategy - c++

I need to render certain scenes and read the whole image back in main memory. I've search for this and it seems that most video cards will accelerate the rendering but the read-back will be very slow. After a bit of research i only found this card mentioning "Hardware-Accelerated Pixel Read-Back"
The other approach would do software rendering and the read-back problem doesn't exist, but then the rendering performance will be bad.
Likely, i will have to implement both in order to be able to find the optimal trade-off, but my question is about what other alternative can i have hardware-wise; i understand Quadro is for modelling and designer market segment, which is precisely the client target of this application, Does this means that i'm not likely to find better pixel read-back performance in other video card lines? i.e: Tesla or Fermi, which don't even have video outputs btw

I don't know if the performance would be any different, but you could at least try rendering to an off-screen buffer, then setting that as a texture of a full-screen quad (or outputting that to video in some other way)

Related

SDL Basics: Textures vs. Images

I'm writing some code that uses SDL2 to display an image with moving markers layered on it, and I think I'd like to use the new (?) 2D hardware accelerated rendering. As I understand it, I have to load an image and convert it to a texture -- but what's the difference? Searching for 'image texture 2d sdl' only gets me tutorials on how to load textures and I'm looking for more of the background rather than the how-to.
So, some questions:
What's a texture versus an image? Aren't they the same thing?
Am I correct in assuming that I need to load the static background image as a texture if I want hardware accelerated rendering? In fact, it sounds like all the bits need to be textures for this to work.
Speaking of OpenGL, are SDL textures actually OpenGL textures?
I'm writing the main app for a single-purpose machine with limited resources (dual core ARM CPU, dual core Mali 400 GPU, 4GB RAM: Olimex A20 LIME2). All I need to do is render an 480x800 (yes, portrait layout) image and put markers on it. I expect the markers to have a single opaque and two transparency layers, to be updated at around 15 fps, and I expect about 125 of them, tops. Is it worth my while to use 2D hardware acceleration or should I just do it in software?
To understand the basics of textures, I advise you to have a look at a simpler library's documentation. Here, the term pixmap is used in the same way as SDL's texture. Essentially, those are already converted and uploaded into your GPU's memory, which makes operations quite a bit faster, but also more complex to deal with.
OpenGL textures are another beast, but we could basically say that they are the same, that is, images in video memory. When binding a texture in OpenGL, you need to upload it to the GPU memory, which is somewhat similar to this texture transformation.
At 125 objects, I think considering using the 2D acceleration becomes worth the hassle, especially if you have to move them around. If this is just a static image, I guess you could go for the regular image route.
As a general rule, I encourage you to use 2D acceleration (or just acceleration, for that matter) whenever possible, if only for the battery improvements. With that said, if the images are static, the outcome will exactly be the same, maybe just slightly different code-path wise. As such, I suppose you could load the static background image just as a regular image without any downsides (note that I am not a SDL professional, so this mixed approach might not work here, but it is worth trying since it will work on most 2D toolkits).
I hope I answered all of your questions :)

How to scale to resolution in SDL?

I'm writing a 2D platformer game using SDL with C++. However I have encountered a huge issue involving scaling to resolution. I want the the game to look nice in full HD so all the images for the game have been created so that the natural resolution of the game is 1920x1080. However I want the game to scale down to the correct resolution if someone is using a smaller resolution, or to scale larger if someone is using a larger resolution.
The problem is I haven't been able to find an efficient way to do this.I started by using the SDL_gfx library to pre-scale all images but this doesn't work as it creates a lot of off-by-one errors, where one pixel was being lost. And since my animations are contained in one image when the animation would play the animation would slightly move up or down each frame.
Then after some looking round I have tried using opengl to handle the scaling. Currently my program draws all the images to a SDL_Surface that is 1920x1080. It then converts this surface to a opengl texture, scales this texture to the screen resolution, then draws the texture. This works fine visually but the problem is that its not efficient at all. Currently I am getting a max fps of 18 :(
So my question is does anyone know of an efficient way to scale the SDL display to the screen resolution?
It's inefficient because OpenGL was not designed to work that way. Main performance problems with current design:
First problem: You're software rasterizing with SDL. Sorry, but no matter what you do with this configuration, that will be a bottleneck. At a resolution of 1920x1080, you have 2,073,600 pixels to color. Assuming it takes you 10 clock cycles to shade each 4-channel pixel, on a 2GHz processor you're running a maximum of 96.4 fps. That doesn't sound bad, except you probably can't shade pixels that fast, and you still haven't done AI, user input, game mechanics, sound, physics, and everything else, and you're probably drawing over some pixels at least once anyway. SDL_gfx may be quick, but for large resolutions, the CPU is just fundamentally overtasked.
Second problem: Each frame, you're copying data across the graphics bus to the GPU. This is the slowest thing you can possibly do graphics-wise. Image data is probably the worst of that, because there's typically so much of it. Basically, each frame you're telling the GPU to copy two million some pixels from RAM to VRAM. According to Wikipedia, you can expect, for 2,073,600 pixels at 4 bytes each, no more than 258.9 fps, which again doesn't sound bad until you remember everything else you need to do.
My recommendation: switch your application completely to OpenGL. This removes the need to render to a texture and copy to the screen--just render directly to the screen! Also, scaling is handled automatically by your view matrix (glOrtho/gluOrtho2D for 2D), so you don't have to care about the scaling issue at all--your viewport will just show everything at the same scale. This is the ideal solution to your problem.
Now, it comes with the one major drawback that you have to recode everything with OpenGL draw commands (which is work, but not too hard, especially in the long run). Short of that, you can try the following ideas to improve speed:
PBOs. Pixel buffer objects can be used to address problem two by making texture loading/copying asynchronous.
Multithread your rendering. Most CPUs have at least two cores and on newer chips two register states can be saved for a single core (Hyperthreading). You're essentially duplicating how the GPU solves the rendering problem (have a lot of threads going). I'm not sure how thread safe SDL_gfx is, but I bet that something could be worked out, especially if you're only working on different parts of the image at the same time.
Make sure you pay attention to what place your draw surface is in SDL. It should probably be SDL_SWSURFACE (because you're drawing on the CPU).
Remove VSync. This can improve performance, even if you're not running at 60Hz
Make sure you're drawing your original texture--DO NOT scale it up or down to a new one. Draw it at a different size, and let the rasterizer do the work!
Sporadically update: Only update half the image at a time. This will probably close to double your "framerate", and it's (usually) not noticeable.
Similarly, only update the changing parts of the image.
Hope this helps.

OpenGL rendering (only real-time)?

I understand that you usually create complex 3D models in Blender or some other 3D modelling software and export it afterwords as .obj. This .obj file gets parsed into your program and openGL will render it. This as far as I understand real-time rendering.
Now I was wondering if there is something like pre-rendered objects. I'm a little bit confused because there are so many articles/videos about real-time rendering but I haven't found any information about none real-time rendering. Does something like this exists or not? The only thing which would come into my mind as none real-time rendering would be a video.
I guess this is pretty much a yes or no question :) but if it exists maybe someone could point me to some websites with explanations.
"Real-time rendering" means that the frames are being generated as fast as they can be displayed. "Non-real-time rendering", or "offline rendering" means generating frames one at a time, taking as much time as necessary to achieve the desired image quality, and then later assembling them into a movie. Video at the quality of video games can be rendered in real time; something as elaborate as a Pixar movie, though, has to be done in offline mode. Individual frames can still take hours of rendering time!
It's not entirely clear what you mean by "prerendered objects", however there are things called VBOs and Vertex Arrays that store the object's geometry in VRAM so as to not have to load it into the rendering pipeline using glVertex3f() or similar every frame. This is called Immediate Mode.
VBOs and Vertex arrays are used instead of immediate mode because they're far faster than calling the graphics driver to load data into VRAM for every vertex because they are kept in VRAM, which is faster than normal RAM, ready to be booted into the render pipeline.
The page here may help, too.
There's nothing stopping you from rendering to an off-screen frame-buffer (i.e., an FBO) and then saving that to disk rather than displaying it to the screen. For instance, that's how GPGPU techniques used to work before the advent of CUDA, OpenCL, etc. ... You would load your data as an unfiltered floating point texture, perform your calculation using pixel-shaders on the FBO, and then save the results back to disk.
In the link I posted above, it states in the overview:
This extension defines a simple interface for drawing to rendering
destinations other than the buffers provided to the GL by the
window-system.
It then goes on to state,
By allowing the use of a framebuffer-attachable image as a rendering
destination, this extension enables a form of "offscreen" rendering.
So, you would get your "non-real-time" rendering by rendering off-screen some scene that renders slower than 30fps, and then saving those results to some movie file or file-sequence format that can be played back at a later date.

Full HD 2D Texture Memory OpenGL

I am in the process of writing a full HD capable 2D engine for a company of artists which will hopefully be cross platform and is written in OpenGL and C++.
The main problem i've been having is how to deal with all those HD sprites. The artists have drawn the graphics at 24fps and they are exported as png sequences. I have converted them into DDS (not ideal, because it needs the directx header to load) DXT5 which reduces filesize alot. Some scenes in the game can have 5 or 6 animated sprites at a time, and these can consist of 200+ frames each. Currently I am loading sprites into an array of pointers, but this is taking too long to load, even with compressed textures, and uses quite a bit of memory (approx 500mb for a full scene).
So my question is do you have any ideas or tips on how to handle such high volumes of frames? There are a couple of ideas i've thought've of:
Use the swf format for storing the frames from Flash
Implement a 2D skeletal animation system, replacing the png sequences (I have concerns about the joints being visible tho)
How do games like Castle Crashers load so quickly with great HD graphics?
Well the first thing to bear in mind is that not all platforms support DXT5 (mobiles specifically).
Beyond that have you considered using something like zlib to compress the textures? The textures will likely have a fair degree of self similarity which will mean that they will compress down a lot. In this day and age decompression is cheap due to the speed of processors and the time saved getting the data off the disk can be far far more useful than the time lost to decompression.
I'd start there if i were you.
24 fps hand-drawn animations? Have you considered reducing the framerate? Even cinema-quality cel animation is only rarely drawn at the full 24-fps. Even going down to 18 fps will get rid of 25% of your data.
In any case, you didn't specify where your load times were long. Is the load from harddisk to memory the problem, or is it the memory to texture load that's the issue? Are you frequently swapping sets of texture data into the GPU, or do you just build a bunch of textures out of it at load time?
If it's a disk load issue, then your only real choice is to compress the texture data on the disk and decompress it into memory. S3TC-style compression is not that compressed; it's designed to be a useable compression technique for texturing hardware. You can usually make it smaller by using a standard compression library on it, such as zlib, bzip2, or 7z. Of course, this means having to decompress it, but CPUs are getting faster than harddisks, so this is usually a win overall.
If the problem is in texture upload bandwidth, then there aren't very many solutions to that. Well, depending on your hardware of interest. If your hardware of interest supports OpenCL, then you can always transfer compressed data to the GPU, and then use an OpenCL program to decompress it on the fly directly into GPU memory. But requiring OpenCL support will impact the minimum level of hardware you can support.
Don't dismiss 2D skeletal animations so quickly. Games like Odin Sphere are able to achieve better animation of 2D skeletons by having several versions of each of the arm positions. The one that gets drawn is the one that matches up the closest to the part of the body it is attached to. They also use clever art to hide any defects, like flared clothing and so forth.

How to speed up offscreen OpenGL rendering with large textures on Win32?

I'm developing some C++ code that can do some fancy 3D transition effects between two images, for which I thought OpenGL would be the best option.
I start with a DIB section and set it up for OpenGL, and I create two textures from input images.
Then for each frame I draw just two OpenGL quads, with the corresponding image texture.
The DIB content is then saved to file.
For example one effect is to locate the two quads (in 3d space) like two billboards, one in front of the other(obscuring it), and then swoop the camera up, forward and down so you can see the second one.
My input images are 1024x768 or so and it takes a really long time to render (100 milliseconds) when the quads cover most of the view. It speeds up if the camera is far away.
I tried rendering each image quad as hundreds of individual tiles, but it takes just the same time, it seems like it depends on the number of visible textured pixels.
I assumed OpenGL could do zillions of polygons a second. Is there something I am missing here?
Would I be better off using some other approach?
Thanks in advance...
Edit :
The GL strings show up for the DIB version as :
Vendor : Microsoft Corporation
Version: 1.1.0
Renderer : GDI Generic
The Onscreen version shows :
Vendor : ATI Technologies Inc.
Version : 3.2.9756 Compatibility Profile Context
Renderer : ATI Mobility Radeon HD 3400 Series
So I guess I'll have to use FBO's , I'm a bit confused as to how to get the rendered data out from the FBO onto a DIB, any pointers (pun intended) on that?
It sounds like rendering to a DIB is forcing the rendering to happen in software. I'd render to a frame buffer object, and then extract the data from the generated texture. Gamedev.net has a pretty decent tutorial.
Keep in mind, however, that graphics hardware is oriented primarily toward drawing on the screen. Capturing rendered data will usually be slower that displaying it, even when you do get the hardware to do the rendering -- though it should still be quite a bit faster than software rendering.
Edit: Dominik Göddeke has a tutorial that includes code for reading back texture data to CPU address space.
One problem with your question:
You provided no actual rendering/texture generation code.
Would I be better off using some other approach?
The simplest thing you can do is to make sure your textures have sizes equal to power of two. I.e. instead of 1024x768 use 1024x1024, and use only part of that texture. Explanation: although most of modern hardware supports non-pow2 textures, they are sometimes treated as "special case", and using such texture MAY produce performance drop on some hardware.
I assumed OpenGL could do zillions of polygons a second. Is there something I am missing here?
Yes, you're missing one important thing. There are few things that limit GPU performance:
1. System memory to video memory transfer rate (probably not your case - only for dynamic textures\geometry when data changes every frame).
2. Computation cost. (If you write a shader with heavy computations, it will be slow).
3. Fill rate (how many pixels program can put on screen per second), AFAIK depends on memory speed on modern GPUs.
4. Vertex processing rate (not your case) - how many vertices GPU can process per second.
5. Texture read rate (how many texels per second GPU can read), on modern GPUs depends on GPU memory speed.
6. Texture read caching (not your case) - i.e. in fragment shader you can read texture few hundreds times per pixel with little performance drop IF coordinates are very close to each other (i.e. almost same texel in each read) - because results are cached. But performance will drop significantly if you'll try to access 100 randomly located texels for every pixels.
All those characteristics are hardware dependent.
I.e., depending on some hardware you may be able to render 1500000 polygons per frame (if they take a small amount of screen space), but you can bring fps to knees with 100 polygons if each polygon fills entire screen, uses alpha-blending and is textured with a highly-detailed texture.
If you think about it, you may notice that there are a lot of videocards that can draw a landscape, but fps drops when you're doing framebuffer effects (like blur, HDR, etc).
Also, you may get performance drop with textured surfaces if you have built-in GPU. When I fried PCIEE slot on previous motherboard, I had to work with built-in GPU (NVidia 6800 or something). Results weren't pleasant. While GPU supported shader model 3.0 and could use relatively computationally expensive shaders, fps rapidly dropped each time when there was a textured object on screen. Obviously happened because built-in GPU used part of system memory as video memory, and transfer rates in "normal" GPU memory and system memory are different.