I am basically trying to do something with the default frame buffer pixmap. I wish to blur it when somebody pauses the game . My problem is that even if I am using a separate thread for the whole blur operation, the method ScreenUtils.getFrameBufferPixmap has to be called on the rendering thread. But this method takes atleast 1 second to return even on nexus 5. Calling the method on my blur processing thread is not possible as there is no gl context available on any other thread other than rendering thread .
Is there any solution for eliminating the stall
What you're trying to do: take a screenshot, modify it on the CPU and upload it back to the GPU. There are 3 problems with this approach.
1.Grabbing pixels, takes a lot of time.
2.Blurring can be successfully executed independentely for each pixel so there is no point doing it on CPU. GPU can do it in a blink of an eye.
3. Uploading the texture back still takes some time.
The correct approach is: instead of rendering everything to the screen render it to the offscreen texture. (See offscreen rendering tutorials) Next, draw this texture on a quad of the size of your screen, but while drawing, use a blur shader. There is a number of example blur shaders available. It should basically sample the surroundings of the target pixel and render it's average.
In the source for ScreenUtils.java you can see that getFrameBufferPixmap is basically a wrapper around OpenGL's glReadPixels. There isn't too much you can do to improve the Java or Libgdx wrapper. This is not the direction OpenGL is optimized for (its good at pushing data up to the GPU, not pulling data off).
You might be better off re-rendering your current screen to a (smaller, off-screen) FrameBuffer, and then pulling that down. You could use the GPU to do the blurring this way, too.
I believe the screen's format (e.g., not RGBA8888) may have an impact on the read performance.
This isn't Libgdx-specific, so any tips or suggestions for OpenGL in general like Making glReadPixel() run faster should apply.
Related
I am here because I'm working on an OpenGL program and I have some issues with performance. I work with OpenGL ES 3.0 on iMX6 soc.
Here is my algorithm :
I get an image from camera which is directly map to a texture.
Using an FBO, I render to texture to map the image on a specific form.
I do the same thing (with a second FBO) for another image which is sent via shared memory by another application. This step is performed only if the image is updated. Only once per second.
I blend these two textures in the default frame buffer to render the result to the screen.
If I perform these three steps separately, It works well and the screen is updated at 30FPS. But when I include the three step in one program the render is very slow and I got only 0.5FPS.
I am wondering if the GPU on the iMX6 is enough powerful, but I think it is not a complex algorithm. I think I am doing something in the wrong way, but what?
I use 3 different frame buffers, so is that a good way or should I use only one?
Can someone give me answer, clues, anything that can help me? :-)
My images dimensions are 1280x1024 x RGBA. Then I am doing some conversion from floating-point texture to integer and back to float, this is done to perform bitwise operation on pixels.
Thanks to #Columbo the problem came from all the conversion, I work with floating-point texture and only for the bitwise operations I do the conversion which improve a lot the performance of the algorithm.
Another point which decrease the performance was the texture format. For the first step, the image was 1280x1024 but only on one composent (grayscale image). To keep only the grayscale composant and not to use too much memory I worked with a GL_RED texture but this wasn't a good idea because when I changed it to GL_RGB, I double the framerate of the render too.
is it possible to do GPU accelerated 3D rendering faster than screen refresh rate?
is it possible to do it with OpenGL? If yes, how, if not, what tool to use?
note
since I want the rendering to be faster than screen refresh rate, I don't mind not having output to the screen. In fact, not having an output window would be an advantage.
I will use render output either programatically (via glReadPixels for example), or outputting to a file as a video for humans to watch it later.
why I want to do this
I want to do computer simulations of robots for computer vision. The simulated robot will have a virtual camera to see this world, and will act depending on the camera input. Therefore, I want simulations to run as fast as possible, disregarding screen refresh rates.
Yes, this should definitely be possible with OpenGL. Rendering rate isn't tied to screen refresh rate (this is why you can see demos running at 500 FPS and suchlike).
As for the mechanics, you can render into an offscreen framebuffer and read the resulting image back into main memory. You can then process / analyse it however you like. See:
Is it possible to have OpenGL draw on a memory surface?
to do it with OpenGL, I had to simply replace the glutSwapBuffers at the end of my display callback function by a glFlush, which took me from 60FPS to 600FPS on a very simple scene.
glutSwapBuffers ensures that each calculated frame is displayed, and therefore stops your program until the screen is refreshed, at which time it puts the back buffer into the front buffer, which can be seen on the screen.
glFlush on the other hand, only ensures that the scene has been calculated and updates the backbuffer (which cannot be seen on the screen) with the new scene. Therefore the new scene calculation stops only until the new scene has been rendered. glutSwapBuffers calls
glFlush before swapping the buffers to ensure that the scene has been calculated.
note
on my application, glReadPixels allowed me to read the pixels correctly even without glutSwapBuffers, so it is reading from the backbuffer.
however, there is a great performance penalty in merely calling glReadPixels to retrieve 700 pixels: FPS fell to 200FPS. I'm guessing this is because the backbuffer must be somewhere on GPU memory (please correct/confirm this), and glReadPixels is doing GPU -> CPU communication, which is very costly.
This means that however fast the render is calculated, my application will still be bottlenecked at 200FPS, unless I grow up and learn how to do the processing on a GPGPU and access the backbuffer from there.
I am rendering an image using OpenGL on C++, and want to access the resulting image to do some more processing on it. (I'm rendering an image, have an actual image it's supposed to look like, and want to compute the pixel difference between the two.)
So far I have only been rendering images to the screen, though, and I can't figure out how to render an image and then later get access at the direct pixels which were drawn. I don't especially care if I can see the image on the screen or not, all I want is that the image gets rendered to some region of memory which I can access from the CPU. How do you do this?
Alternatively, would it be possible to send the image it's supposed to look like to OpenGL and compute the pixel difference on the GPU? Either option is fine with me, but the faster I can make it the better. (Right now, I can render about 100 frames per second, but still haven't figured out how to do the comparisons.)
Yes, you could do it on the GPU. Put the 2 images in textures. Draw a frame-filling quad multi-textured with the two textures, and be sure to provide texture coordinates. Write a fragment shader to compute the difference. (When a commenter asked if you wanted to use a programmable pipeline, this is one reason it matters. If you only use the fixed-function pipeline, you wouldn't have the option of writing a fragment shader.)
The obvious way would be to use glReadPixels to read the rendered results in the framebuffer to host memory.
I'm kind of stuck on the logic behind an SDL2 texture. To me, they are pointless since you cannot draw to them.
In my program, I have several surfaces (or what were surfaces before I switched to SDL2) that I just blitted together to form layers. Now, it seems, I have to create several renderers and textures to create the same effect since SDL_RenderCopy takes a texture pointer.
Not only that, but all renderers have to come from a window, which I understand, but still fouls me up a bit more.
This all seems extremely bulky and slow. Am I missing something? Is there a way to draw directly to a texture? What are the point of textures, and am I safe to have multiple (if not hundreds) of renderers in place of what were surfaces?
SDL_Texture objects are stored as close as possible to video card memory and therefore can easily be accelerated by your GPU. Resizing, alpha blending, anti-aliasing and almost any compute-heavy operation can harshly be affected by this performance boost. If your program needs to run a per-pixel logic on your textures, you are encouraged to convert your textures into surfaces temporarily. Achieving a workaround with streaming textures is also possible.
Edit:
Since this answer recieves quite the attention, I'd like to elaborate my suggestion.
If you prefer to use Texture -> Surface -> Texture workflow to apply your per-pixel operation, make sure you cache your final texture unless you need to recalculate it on every render cycle. Textures in this solution are created with SDL_TEXTUREACCESS_STATIC flag.
Streaming textures (creation flag is SDL_TEXTUREACCESS_STREAMING) are encouraged for use cases where source of the pixel data is network, a device, a frameserver or some other source that is beyond SDL applications' full reach and when it is apparent that caching frames from source is inefficient or would not work.
It is possible to render on top of textures if they are created with SDL_TEXTUREACCESS_TARGET flag. This limits the source of the draw operation to other textures although this might already be what you required in the first place. "Textures as render targets" is one of the newest and least widely supported feature of SDL2.
Nerd info for curious readers:
Due to the nature of SDL implementation, the first two methods depend on application level read and copy operations, though they are optimized for suggested scenarios and fast enough for realtime applications.
Copying data from application level is almost always slow when compared to post-processing on GPU. If your requirements are more strict than what SDL can provide and your logic does not depend on some outer pixel data source, it would be sensible to allocate raw OpenGL textures painted from you SDL surfaces and apply shaders (GPU logic) to them.
Shaders are written in GLSL, a language which compiles into GPU assembly. Hardware/GPU Acceleration actually refers to code parallelized on GPU cores and using shaders is the prefered way to achieve that for rendering purposes.
Attention! Using raw OpenGL textures and shaders in conjunction with SDL rendering functions and structures might cause some unexpected conflicts or loss of flexibility provided by the library.
TLDR;
It is faster to render and operate on textures than surfaces although modifying them can sometimes be cumborsome.
Through creating a SDL2 Texture as a STREAMING type, one can lock and unlock the entire texture or just an area of pixels to perform direct pixel operations. One must create prior a SDL2 Surface, and link with lock-unlock as follows:
SDL_Surface surface = SDL_CreateSurface(..);
SDL_LockTexture(texture, &rect, &surface->pixels, &surface->pitch);
// paint into surface pixels
SDL_UnlockTexture(texture);
The key is, if you draw to texture of larger size, and the drawing is incremental ( e.g. data graph in real time ) be sure to only lock and unlock the actual area to update. Otherwise the operations will be slow, with heavy memory copying.
I have experienced reasonable performance and the usage model is not too difficult to understand.
In SDL2 it is possible to render off-screen / render directly to a texture. The function to use is:
int SDL_SetRenderTarget(SDL_Renderer *renderer, SDL_Texture *texture);
This only works if the renderer enables SDL_RENDERER_TARGETTEXTURE.
I'm working on a windowed Direct3D data plotting application that needs to display multiple overlays on top of the data (similar to HUDs in games). Since there could be a large amount of data that needs plotting, and not all overlays will be changed every time, I figured it wouldn't be a good idea to replot verticies when only one overlay in the display changes.
This led me to the idea of rendering the textures and verticies of the overlays to multiple textures with transparent backgrounds that could be overlaid in the render loop and updated independently (similar to layers in Photoshop).
Before I embark on changing a large portion of this program to render to textures as opposed to surfaces, I was just wondering if using textures is the best approach.
RTT works well, I used it in a game I did recently. Each scene (scene refers to layer, "HUD" was a scene, "Main" was the main scene etc...) was rendered onto a texture, then each texture was rendering onto a quad, sorted back to front (for alpha blending). I chose this over just rendering the scenes directly onto the back buffer because it allowed me to do post-processing.
For your caching purposes this seems to be the best way to go, but just be aware that the textures can eat memory quickly, and sometimes its just better to render everything again, making sure you sort back to front.
Render to texture will certainly work and could be a good route but it is probably overkill. Modern 3D hardware is very fast and I'd suggest you verify whether performance is really an issue re-rendering when you need an update before investing significant time making major changes to your program.
If performance is an issue your time might be better spent optimizing the code that renders your plot since that will benefit updates that involve changes to the data as well as those that just change an overlay. I'm a graphics programmer for games and generally with realtime 3D you want to focus your optimization efforts on your worst case (you have to redraw everything) rather than your best (only one overlay needs an update).
Rendering to texture render target surfaces is a very good idea, and can be used for a lot of things e.g. optimization/caching, but beware of the blend operation with regular alpha (a*c1 + (1-a)*c2); if # is ARGB blend, then l1#l2#l3 != l3#l1#l2; i.e. it's not commutative, but by using pre-multiplied alpha in all textures/layers the blend operation can be made commutative.
The ultimate reference is the Porter/Duff article "Compositing Digital Images" from 1984.