After reading this article I wanted to try to do the same, but to speed things up the rendering part I've wanted to be performed on the GPU, needless to say why the triangles or any other geometric objects should be rendered on GPU rather than CPU.
Here's one nice image of the process:
The task:
Render 'set of vertices'
Estimate the difference pixel by
pixel between the rendered 'set of vertices' and the Mona Lisa image (Mona Lisa is located on GPU in texture or PBO no big difference)
The problem:
When using OpenCL or Cuda with OpenGL FBO (Frame Buffer Object) extension.
In this case according to our task
Render 'set of vertices' (handled by OpenGL)
Estimate the difference pixel by
pixel between the rendered 'set of vertices' and the Mona Lisa image (handled by OpenCL or Cuda)
So in this case I'm forced to do copies from FBO to PBO (Pixel Buffer Object) to get rendered 'set of vertices' available for OpenCL/Cuda. I know how fast are Device to Device memory copies but according to the fact that I need to do thousands of these copies it makes sense not to do so...
This problem leaves three choices:
Render with OpenGL to PBO (somehow, I don't know how, It also might be impossible to do so)
Render the image and estimate the difference between images totally with OpenGL (somehow, I don't know how, maybe by using shaders, the only problem is that I've never written a shader in my life and this might take months of work for me...)
Render the image and estimate the difference between images totally with OpenCL/Cuda (I know how to do this, but also it will take months to get stable and more or less optimized version of renderer implemented in OpenCL or Cuda)
The question
Can anybody help me with writing a shader for the above process or maybe point-out the way of rendering the Mona Lisa to PBO without copies from FBO...
My gut feeling is that the Shader approach is also going to have the same IO problem, you certainly can compare textures in a shader as long as the GPU supports PS 4.0 or higher; but you've still got to get the source texture (Mona Lisa) on to the device in the first place.
Edit: Been digging around a bit and this forum post might provide some insight:
http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=221384&page=1.
The poster, Komat, provides an example of the shader on the 2nd page.
Related
I'm working on a visual odometry algorithm that tracks movement of the camera between images. An integral part of this algorithm is being able to generate incremental dense warped images of a reference image, where each pixel has a corresponding depth (so it can be considered a point cloud of width x height dimensions)
I haven't had much experience working with OpenGL in the past, but having gone through a few tutorials, I managed to setup an offscreen rendering pipeline to take in a transformation matrix and render the pointcloud from the new perspective. I'm using VBOs to load the data in the GPU and renderbuffers to render, and glReadPixels() to read into CPU memory.
On my Nvidia card, I can render at ~1 ms per warp. Is that the fastest I can render the data (640x480 3D points)? This step is proving to be a major bottleneck for my algorithm, so I'd really appreciate any performance tips!
(I thought that one optimization could be rendering only in grayscale, since I don't really care about colour, but it seems like internally OpenGL uses colour anyway)
My current implementation is at
https://gist.github.com/icoderaven/1212c7623881d8cd5e1f1e0acb7644fb,
and the shaders at
https://gist.github.com/icoderaven/053c9a6d674c86bde8f7246a48e5c033
Thanks!
I am basically trying to do something with the default frame buffer pixmap. I wish to blur it when somebody pauses the game . My problem is that even if I am using a separate thread for the whole blur operation, the method ScreenUtils.getFrameBufferPixmap has to be called on the rendering thread. But this method takes atleast 1 second to return even on nexus 5. Calling the method on my blur processing thread is not possible as there is no gl context available on any other thread other than rendering thread .
Is there any solution for eliminating the stall
What you're trying to do: take a screenshot, modify it on the CPU and upload it back to the GPU. There are 3 problems with this approach.
1.Grabbing pixels, takes a lot of time.
2.Blurring can be successfully executed independentely for each pixel so there is no point doing it on CPU. GPU can do it in a blink of an eye.
3. Uploading the texture back still takes some time.
The correct approach is: instead of rendering everything to the screen render it to the offscreen texture. (See offscreen rendering tutorials) Next, draw this texture on a quad of the size of your screen, but while drawing, use a blur shader. There is a number of example blur shaders available. It should basically sample the surroundings of the target pixel and render it's average.
In the source for ScreenUtils.java you can see that getFrameBufferPixmap is basically a wrapper around OpenGL's glReadPixels. There isn't too much you can do to improve the Java or Libgdx wrapper. This is not the direction OpenGL is optimized for (its good at pushing data up to the GPU, not pulling data off).
You might be better off re-rendering your current screen to a (smaller, off-screen) FrameBuffer, and then pulling that down. You could use the GPU to do the blurring this way, too.
I believe the screen's format (e.g., not RGBA8888) may have an impact on the read performance.
This isn't Libgdx-specific, so any tips or suggestions for OpenGL in general like Making glReadPixel() run faster should apply.
I'm finding a way to do 3d filters in directx or opengl shaders, same as the gaussian filter for images.In detail, it is to do proccessing for every voxel of a 3d texture.
Maybe store the volume data in slices can do it, but it is not a friendly way to access the volume data and not easy to write in shaders.
sorry for my poor english, any reply will be appreciate.
p.s.:Cuda's texture memory can do this work, but my poor gpu can only run in a very low frame rate with debug model,and i don't know why.
There is a 3D texture target in both Direct3D and OpenGL. Of course target framebuffers are still 2D. So using a compute shader, OpenCL or DirectCompute may be better suited for pure filter purposes, that don't include rendering to screen.
I am rendering an image using OpenGL on C++, and want to access the resulting image to do some more processing on it. (I'm rendering an image, have an actual image it's supposed to look like, and want to compute the pixel difference between the two.)
So far I have only been rendering images to the screen, though, and I can't figure out how to render an image and then later get access at the direct pixels which were drawn. I don't especially care if I can see the image on the screen or not, all I want is that the image gets rendered to some region of memory which I can access from the CPU. How do you do this?
Alternatively, would it be possible to send the image it's supposed to look like to OpenGL and compute the pixel difference on the GPU? Either option is fine with me, but the faster I can make it the better. (Right now, I can render about 100 frames per second, but still haven't figured out how to do the comparisons.)
Yes, you could do it on the GPU. Put the 2 images in textures. Draw a frame-filling quad multi-textured with the two textures, and be sure to provide texture coordinates. Write a fragment shader to compute the difference. (When a commenter asked if you wanted to use a programmable pipeline, this is one reason it matters. If you only use the fixed-function pipeline, you wouldn't have the option of writing a fragment shader.)
The obvious way would be to use glReadPixels to read the rendered results in the framebuffer to host memory.
Can anyone pls tell me how to use hardware memory to create textures in OpenGL ? Currently I'm running my game in window mode, do I need to switch to fullscreen to get the use of hardware ?
If I can create textures in hardware, is there a limit for no of textures (other than the hardware memory) ? and then how can I cache my textures into hardware ? Thanks.
This should be covered by almost all texture tutorials for OpenGL. For example here, here and here.
For every texture you first need a texture name. A texture name is like a unique index for a single texture. Every name points to a texture object that can have its own parameters, data, etc. glGenTextures is used to get new names. I don't know if there is any limit besides the uint range (2^32). If there is then you will probably get 0 for all new texture names (and a gl error).
The next step is to bind your texture (see glBindTexture). After that all operations that use or affect textures will use the texture specified by the texture name you used as parameter for glBindTexture. You can now set parameters for the texture (glTexParameter) and upload the texture data with glTexImage2D (for 2D textures). After calling glTexImage you can also free the system memory with your texture data.
For static textures all this has to be done only once. If you want to use the texture you just need to bind it again and enable texturing (glEnable(GL_TEXTURE_2D)).
The size (width/height) for a single texture is limited by GL_MAX_TEXTURE_SIZE. This is normally 4096, 8192 or 16384. It is also limited by the available graphics memory because it has to fit into it together with some other resources like the framebuffer or vertex buffers. All textures together can be bigger then the available memory but then they will be swapped.
In most cases the graphics driver should decide which textures are stored in system memory and which in graphics memory. You can however give certain textures a higher priority with either glPrioritizeTextures or with glTexParameter.
Edit:
I wouldn't worry too much about where textures are stored because the driver normally does a very good job with that. Textures that are used often are also more likely to be stored in graphics memory. If you set a priority that's just a "hint" for the driver on how important it is for the texture to stay on the graphics card. It's also possible the the priority is completely ignored. You can also check where textures currently are with glAreTexturesResident.
Usually when you talk about generating a texture on the GPU, you're not actually creating texture images and applying them like normal textures. The simpler and more common approach is to use Fragment shaders to procedurally calculate the colors of for each pixel in real time from scratch for every single frame.
The canonical example for this is to generate a Mandelbrot pattern on the surface of an object, say a teapot. The teapot is rendered with its polygons and texture coordinates by the application. At some stage of the rendering pipeline every pixel of the teapot passes through the fragment shader which is a small program sent to the GPU by the application. The fragment shader reads the 2D texture coordinates and calculates the Mandelbrot set color of the 2D coordinates and applies it to the pixel.
Fullscreen mode has nothing to do with it. You can use shaders and generate textures even if you're in window mode. As I mentioned, the textures you create never actually occupy space in the texture memory, they are created on the fly. One could probably think of a way to capture and cache the generated texture but this can be somewhat complex and require multiple rendering passes.
You can learn more about it if you look up "GLSL" in google - the OpenGL shading language.
This somewhat dated tutorial shows how to create a simple fragment shader which draws the Mandelbrot set (page 4).
If you can get your hands on the book "OpenGL Shading Language, 2nd Edition", you'll find it contains a number of simple examples on generating sky, fire and wood textures with the help of an external 3D Perlin noise texture from the application.
To create a texture on GPU look into "render to texture" tutorials. There are two common methods: Binding a PBuffer context as texture, or using Frame Buffer Objects. PBuffer render to textures are the older method, and have the wider support. Frame Buffer Objects are easier to use.
Also you don't have to switch to "fullscreen" mode for OpenGL to be HW accelerated. In fact OpenGL doesn't know about windows at all. A fullscreen OpenGL window is just that: A toplvel window on top of all other windows with no decorations and the input focus grabed. Some drivers bypass window masking and clipping code, and employ a simpler, faster buffer swap method if the window with the active OpenGL context covers the whole screen, thus gaining a little performance, but with current hard- and software the effect is very small compared to other influences.