Drawing on screen, directly from the gpu - drawing

I am doing some processing/pixel classification on a picture and I'm using a gpu for it. My question is: is there a library that i can use, in order to print my final picture (2d matrix) on screenm directly from the gpu memory instead of bringing it back to cpu and printing from there?
I don't want anything extravagant, i will only color the pixels and i want to show the new colors.

OpenGL, Cuda draws to a texture and openGL tells the card to display it
See the mandlebrot example
Update: opencl 1.1 now allows sharing opengl contexts with opencl code

Related

Fastest way to modify openGL texture with openCL per pixel

Using OpenGL 4.4 and OpenCL 2.0, lets say i just want to modify specific pixels of a texture per frame.
Which is the optimal way to achieve this?
Which object should i share?
Will i be able to modify only limited number of pixels?
I want GPU only operations.
First off, there are no OpenCL 2.0 drivers yet; the specification only recently got finalized and implementations probably won't happen until 2014.
Likewise, many OpenGL implementations aren't at 4.4 yet.
However, you can still do what you want with OpenCL 1.2 (or 1.1 since NVIDIA is behind the industry in OpenCL support) and current OpenGL implementations.
Look for OpenCL / OpenGL interop examples, but basically:
Create OpenCL context from OpenGL context
Create OpenCL image from OpenGL texture
After rendering your OpenGL into the texture, acquire the image for OpenCL, run an OpenCL kernel that only update the specific pixel you want to update, and release it back to OpenGL
Draw the texture to the screen
Often OpenCL kernels are 2D and address each pixel, but you can run a 1D kernel where each work item updates a single pixel based on some algorithm. Just make sure not to write the same pixel from more than one work item or you'll have a race condition.

Cuda and/or OpenGL for geometric image transformation

My question concerns the most efficient way of performing geometric image transformations on the GPU. The goal is essentially to remove lens distortion from aquired images in real time. I can think of several ways to do it, e.g. as a CUDA kernel (which would be preferable) doing an inverse transform lookup + interpolation, or the same in an OpenGL shader, or rendering a forward transformed mesh with the image texture mapped to it. It seems to me the last option could be the fastest because the mesh can be subsampled, i.e. not every pixel offset needs to be stored but can be interpolated in the vertex shader. Also the graphics pipeline really should be optimized for this. However, the rest of the image processing is probably going to be done with CUDA. If I want to use the OpenGL pipeline, do I need to start an OpenGL context and bring up a window to do the rendering, or can this be achieved anyway through the CUDA/OpenGL interop somehow? The aim is not to display the image, the processing will take place on a server, potentially with no display attached. I've heard this could crash OpenGL if bringing up a window.
I'm quite new to GPU programming, any insights would be much appreciated.
Using the forward transformed mesh method is the more flexible and easier one to implement. However performance wise there's no big difference, as the effective limit you're running into is memory bandwidth, and the amount of memory bandwidth consumed does only depend on the size of your input image. If it's a fragment shader, fed by vertices or a CUDA texture access that's causing the transfer doesn't matter.
If I want to use the OpenGL pipeline, do I need to start an OpenGL context and bring up a window to do the rendering,
On Windows: Yes, but the window can be an invisible one.
On GLX/X11 you need an X server running, but you can use a PBuffer instead of a window to get a OpenGL context.
In either case use a Framebuffer Object as the actual drawing destination. PBuffers may corrupt their primary framebuffer contents at any time. A Framebuffer Object is safe.
or can this be achieved anyway through the CUDA/OpenGL interop somehow?
No, because CUDA/OpenGL interop is for making OpenGL and CUDA interoperate, not make OpenGL work from CUDA. CUDA/OpenGL Interop helps you with the part you mentioned here:
However, the rest of the image processing is probably going to be done with CUDA.
BTW; maybe OpenGL Compute Shaders (available since OpenGL-4.3) would work for you as well.
I've heard this could crash OpenGL if bringing up a window.
OpenGL actually has no say in those things. It's just a API for drawing stuff on a canvas (canvas = window or PBuffer or Framebuffer Object), but it doesn't deal with actually getting a canvas on the scaffolding, so to speak.
Technically OpenGL doesn't care if there's a window or not. It's the graphics system on which the OpenGL context is created. And unfortunately none of the currently existing GPU graphics systems supports true headless operation. NVidia's latest Linux drivers may allow for some crude hacks to setup a truly headless system, but I never tried that, so far.

Render Mona Lisa to PBO

After reading this article I wanted to try to do the same, but to speed things up the rendering part I've wanted to be performed on the GPU, needless to say why the triangles or any other geometric objects should be rendered on GPU rather than CPU.
Here's one nice image of the process:
The task:
Render 'set of vertices'
Estimate the difference pixel by
pixel between the rendered 'set of vertices' and the Mona Lisa image (Mona Lisa is located on GPU in texture or PBO no big difference)
The problem:
When using OpenCL or Cuda with OpenGL FBO (Frame Buffer Object) extension.
In this case according to our task
Render 'set of vertices' (handled by OpenGL)
Estimate the difference pixel by
pixel between the rendered 'set of vertices' and the Mona Lisa image (handled by OpenCL or Cuda)
So in this case I'm forced to do copies from FBO to PBO (Pixel Buffer Object) to get rendered 'set of vertices' available for OpenCL/Cuda. I know how fast are Device to Device memory copies but according to the fact that I need to do thousands of these copies it makes sense not to do so...
This problem leaves three choices:
Render with OpenGL to PBO (somehow, I don't know how, It also might be impossible to do so)
Render the image and estimate the difference between images totally with OpenGL (somehow, I don't know how, maybe by using shaders, the only problem is that I've never written a shader in my life and this might take months of work for me...)
Render the image and estimate the difference between images totally with OpenCL/Cuda (I know how to do this, but also it will take months to get stable and more or less optimized version of renderer implemented in OpenCL or Cuda)
The question
Can anybody help me with writing a shader for the above process or maybe point-out the way of rendering the Mona Lisa to PBO without copies from FBO...
My gut feeling is that the Shader approach is also going to have the same IO problem, you certainly can compare textures in a shader as long as the GPU supports PS 4.0 or higher; but you've still got to get the source texture (Mona Lisa) on to the device in the first place.
Edit: Been digging around a bit and this forum post might provide some insight:
http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=221384&page=1.
The poster, Komat, provides an example of the shader on the 2nd page.

OpenCL or OpenGL – which one to use?

My Problem involves a black and white image with a black area in the middle.
I never worked with OpenGL or OpenCL before so I do not know which one to chose.
I want to put some white circles over the area and check at the end whether the whole image is white. I will try many combinations so I want to use the GPU because of its parallelism.
Should I use OpenGL and create the circle as a texture and put it on top of the image or should I write some OpenCL kernels which work on the pixel/entries in the matrix?
You probably want to use both. OpenGL is great for drawing things, and OpenCL is great for analyzing. You can share textures between OpenGL and OpenCL, so the overhead of the transition between the two should be negligible.

How to create textures within GPU

Can anyone pls tell me how to use hardware memory to create textures in OpenGL ? Currently I'm running my game in window mode, do I need to switch to fullscreen to get the use of hardware ?
If I can create textures in hardware, is there a limit for no of textures (other than the hardware memory) ? and then how can I cache my textures into hardware ? Thanks.
This should be covered by almost all texture tutorials for OpenGL. For example here, here and here.
For every texture you first need a texture name. A texture name is like a unique index for a single texture. Every name points to a texture object that can have its own parameters, data, etc. glGenTextures is used to get new names. I don't know if there is any limit besides the uint range (2^32). If there is then you will probably get 0 for all new texture names (and a gl error).
The next step is to bind your texture (see glBindTexture). After that all operations that use or affect textures will use the texture specified by the texture name you used as parameter for glBindTexture. You can now set parameters for the texture (glTexParameter) and upload the texture data with glTexImage2D (for 2D textures). After calling glTexImage you can also free the system memory with your texture data.
For static textures all this has to be done only once. If you want to use the texture you just need to bind it again and enable texturing (glEnable(GL_TEXTURE_2D)).
The size (width/height) for a single texture is limited by GL_MAX_TEXTURE_SIZE. This is normally 4096, 8192 or 16384. It is also limited by the available graphics memory because it has to fit into it together with some other resources like the framebuffer or vertex buffers. All textures together can be bigger then the available memory but then they will be swapped.
In most cases the graphics driver should decide which textures are stored in system memory and which in graphics memory. You can however give certain textures a higher priority with either glPrioritizeTextures or with glTexParameter.
Edit:
I wouldn't worry too much about where textures are stored because the driver normally does a very good job with that. Textures that are used often are also more likely to be stored in graphics memory. If you set a priority that's just a "hint" for the driver on how important it is for the texture to stay on the graphics card. It's also possible the the priority is completely ignored. You can also check where textures currently are with glAreTexturesResident.
Usually when you talk about generating a texture on the GPU, you're not actually creating texture images and applying them like normal textures. The simpler and more common approach is to use Fragment shaders to procedurally calculate the colors of for each pixel in real time from scratch for every single frame.
The canonical example for this is to generate a Mandelbrot pattern on the surface of an object, say a teapot. The teapot is rendered with its polygons and texture coordinates by the application. At some stage of the rendering pipeline every pixel of the teapot passes through the fragment shader which is a small program sent to the GPU by the application. The fragment shader reads the 2D texture coordinates and calculates the Mandelbrot set color of the 2D coordinates and applies it to the pixel.
Fullscreen mode has nothing to do with it. You can use shaders and generate textures even if you're in window mode. As I mentioned, the textures you create never actually occupy space in the texture memory, they are created on the fly. One could probably think of a way to capture and cache the generated texture but this can be somewhat complex and require multiple rendering passes.
You can learn more about it if you look up "GLSL" in google - the OpenGL shading language.
This somewhat dated tutorial shows how to create a simple fragment shader which draws the Mandelbrot set (page 4).
If you can get your hands on the book "OpenGL Shading Language, 2nd Edition", you'll find it contains a number of simple examples on generating sky, fire and wood textures with the help of an external 3D Perlin noise texture from the application.
To create a texture on GPU look into "render to texture" tutorials. There are two common methods: Binding a PBuffer context as texture, or using Frame Buffer Objects. PBuffer render to textures are the older method, and have the wider support. Frame Buffer Objects are easier to use.
Also you don't have to switch to "fullscreen" mode for OpenGL to be HW accelerated. In fact OpenGL doesn't know about windows at all. A fullscreen OpenGL window is just that: A toplvel window on top of all other windows with no decorations and the input focus grabed. Some drivers bypass window masking and clipping code, and employ a simpler, faster buffer swap method if the window with the active OpenGL context covers the whole screen, thus gaining a little performance, but with current hard- and software the effect is very small compared to other influences.