GPU programming for image processing

GPU programming for image processing - opengl

I'm working on a project aimed to control a bipad humanoid robot. Unfortunately we have a very limited set of hardware resources (a RB110 board and its mini PCI graphic card). I'm planning to port image processing tasks from CPU to graphic card processor of possible but never done it before... I'm advised to use OpenCV but seems to impossible because our graphic card processor (Volari Z9s) is not supported by framework. Then I found an interesting post on Linux Journal. Author have used OpenGL to process frames retrieved from a v4l device.
I'm a little confused about the relationship between hardware API and OpenGL/OpenCV. In order to utilize a GPU, do the hardware need to be sopported by graphic programming frameworks (OpenGL/OpenCV)? Where can I find such an API?
I googled a lot about my hardware, unfortunately the vendor (XGI Technology) seems to be somehow extinct...

In order to utilize a GPU, do the hardware need to be sopported by graphic programming frameworks (OpenGL/OpenCV)? Where can I find such an API?
OpenCL and OpenGL are both translated to hardware instructions by the GPU driver, so you need a driver for your operating system that supports these frameworks. Most GPU drivers support some version of OpenGL so that should work.
The OpenGL standard is maintained by the Khronos Group and you can find some tutorials at nehe.
How OpenGL works
OpenGL accepts triangles as input and draws them according to the state it has when the draw is issued. Most OpenGL functions are there to change the operations performed by manipulating this state. Image manipulation can be done by loading the input image as a texture and drawing several vertices with the texture active, resulting in a new Image (or more generic a new 2D grid of data).
From version > 2 (or with the right ARB extensions) the operations performed on the image can be controlled with GLSL programs called vertex and fragment shaders (there are more shaders, but these are the oldest). A vertex shader will be called once per vertex, the results of this are interpolated and forwarded to the fragment shader. A fragment shader will be called every time a new fragment(pixel) is written to the result.
Now this is all about reading and writing images, how to use it for object detection?
Use Vertices to span the input texture over the whole viewport. Instead of computing rgb colors and storing them in the result you can write a fragmentshader that computes grayscale images / gradient images and then checks these textures for each pixel if the pixel is in the center of a cycle with a specific size, part of a line or just has a relatively high gradient compared to its surrounding (good feature) or really anithing else you can find a good parallel algorithm for. (haven't done this myself)
The end result has to be read back to the cpu (sometimes you can use shaders to scale the data down before doing this). OpenCL gives it a less Graphics like feel and gives a lot more freedom but is less supported.

First of all You need shader support (GLSL or asm)
Usual way will be rendering full screen quad with your image (texture) and applying fragment shader. It's called Post-Processing And limited with instruction set and another limitations that your hardware has. On basic lvl it allows you to apply simple (single function) on large data set in parallel way that will produce another data set. But branching (if it is supported) is first performance enemy because GPU consist from couple SIMD blocks

Related

How can I draw to the display, without OpenGL?

I've been learning OpenGL, and as I sit trying to write my VBOs, PBOs, VAOs, textures, quads, bindings, fragment shaders, vertex shaders, and a whole suite of other modern abstractions upon abstractions built after decades of evolution, I wonder: Isn't the display nothing but a large block of memory?
I've heard of tales, that in the "good ol' days" (such as the Commodore 64), all you had to do was assign a value to an arbitrary byte in memory, and the screen would change a pixel. Extremely simple and elegant. In the modern day, this has changed with layers upon layers of abstractions and safeguards, such that changing a pixel on your display is several hundred feet away.
This begs the question, is it possible in the modern day to just "update a pixel of the screen"? Is it possible to write my own graphics driver or something, where I can send commands to some C wrapper which interfaces with the GPU to change those pixels? This is an extremely broad question, but I'm curious. The answer I'm looking for to this question would provide a rough outline of what you'd have to do in order to be able to arbitrarily get some C code to set a pixel on the screen, as well as a rough outline of why OpenGL has progressed the way it has - what problems did VBOs, PBOs, VAOs, bindings, shaders, etc. solve, and how we got to where we are today.

Isn't the display nothing but a large block of memory?
Yes, it is called a framebuffer.
I've heard of tales, that in the "good ol' days" (such as the Commodore 64)
Your current PC works like that right when you power it up! If you use the CPU to write into video memory, that is called a software renderer.
In the modern day, this has changed with layers upon layers of abstractions and safeguards, such that changing a pixel on your display is several hundred feet away.
No, they are not abstractions/safeguards for "changing pixels". Nowadays software renderers are not used anymore. Instead, you have to tell the GPU (which is another computer on its own) how to draw. That "talk" is what the APIs (like OpenGL) do for you.
Now, the GPUs are meant to be fast at drawing, and that requires specialized code and data structures. Those are all the things you mention: VBOs, PBOs, VAOs, shaders, etc. (in OpenGL parlance). There is no way around that, because GPUs are different hardware.
is it possible in the modern day to just "update a pixel of the screen"?
Yes, but that will end up being drawn somehow by the GPU, even if it looks to you like a memory write.
Is it possible to write my own graphics driver or something, where I can send commands to some C wrapper which interfaces with the GPU to change those pixels?
Yes, but that "C wrapper" is the graphics driver. A graphics driver for a modern GPU is very complex.
what you'd have to do in order to be able to arbitrarily get some C code to set a pixel on the screen
You cannot write a "C program" to write to a graphical screen because the C standard does not concern itself with graphical displays.
So it depends on your operating system, your hardware, whether you want 2D or 3D acceleration support, the API you choose...
as well as a rough outline of why OpenGL has progressed the way it has - what problems did VBOs, PBOs, VAOs, bindings, shaders, etc. solve, and how we got to where we are today.
See above.

You can make your own frame buffer - that is just an integer array - and do rasterization on it, then use for example the Windows GDI function SetBitmapBits() to draw it to the display in one go. The final draw-to-display command depends on the operating system.
How you do the rasterization on your framebuffer is completely up to you. You can use the CPU to draw individual pixels or rasterize lines and triangles, see for example this demo of my old CPU graphics engine using Windows GDI: https://youtu.be/GFzisvhtRS4.
Using the CPU is fine as long as you do not rasterize large datasets. From my experience, the limit to real-time 60fps rendering on the CPU is ~50k lines per frame.
If you want to rasterize really large datasets, you have to use a GPU in some way. Since the framebuffer is just an integer array, you can transfer it to/from the GPU using OpenCL or CUDA and on the GPU - if your dataset happens to already be in video memory - do all the rasterization extremely fast in parallel. For this you will need an additional z-buffer to decide which pixels to overdraw by occluding geometries. This way you can rasterize approximately 30 Million lines per frame at 60fps. This demo is rendered on the GPU in real time using OpenCL: https://youtu.be/lDsz2maaZEo

Is it possible in the modern day to just "update a pixel of the screen"?
Yes. In Windows for example, you can use SetPixel() to draw a pixel or BitBlt() to draw in bulk. See this Q/A
This works fine, but this means you're using the CPU for rendering and you'll find the GPU is much more effective for this task, especially if you require decent framerate and non-trivial graphics. The reason there's these "whole suite of other modern abstractions upon abstractions" is to serve as an interface to the GPU since it has an independent set of memory and totally different execution model. Other GPU libraries (OpenCL, DirectX, Vulkan, etc) all have the same kind of abstractions.
I've glossed over many nuances but I hope the point gets across.

What is the use case of cudaGraphicsRegisterFlagsWriteDiscard in cudaGraphicsGLRegisterImage?

I'm fairly new to CUDA, but I've managed to display something generated by a kernel on the screen using OpenGL. I've tried several approach :
Using a PBO and an OpenGL texture (old style);
Using a OpenGL texture as a CUDA surface and rendering on a quad (new style);
Using a renderbuffer as a CUDA surface and rendering using glBlitFramebuffer.
All of them worked, but, while implementing #2, I erroneously set the hint as cudaGraphicsRegisterFlagsWriteDiscard. Since all of the data will be generated by CUDA, I thought this was the correct option. However, later I realized that I needed a CUDA surface to write to an OpenGL texture, and when you use a surface, you are requested to use the LoadStore flag.
So basically my question is this : Since I absolutely need a CUDA surface to write to an OpenGL texture in CUDA, what is the use case of cudaGraphicsRegisterFlagsWriteDiscard in cudaGraphicsGLRegisterImage?

The documentation description seems pretty straightforward. It is for one-way delivery of data from CUDA to OpenGL.
This online book excerpt provides a similar explanation:
Applications where CUDA is the producer and OpenGL is the consumer should register the objects with a write-discard flag...
If you want to see an example, take a look at the postProcessGL cuda sample. In that case, OpenGL is rendering an image, and it's being post-processed (blur added) by cuda, before display. In this case, there are two separate pathways for data flow. In the OpenGL->CUDA case, the data is handled by the createTextureSrc function, and the flag specified is read-only. For the CUDA->OpenGL case (delivery of the post-processed frame) the function is handled in createTextureDst, where a call is made to cudaGraphicsGLRegisterImage with the cudaGraphicsMapFlagsWriteDiscard flag specified, since on this path, CUDA is producing and OpenGL is consuming.
To understand how the textures are handled (populated with data from the cuda operations via a cudaArray) you probably want to study the sequence of operations in processImage().

Using shader for calculations

Is it possible to use shader for calculating some values and then return them back for further use?
For example I send mesh down to GPU, with some parameters about how it should be modified(change position of vertices), and take back resulting mesh? I see that rather impossible because I haven't seen any variable for comunication from shaders to CPU. I'm using GLSL so there are just uniform, atributes and varying. Should I use atribute or uniform, would they be still valid after rendering? Can I change values of those variables and read them back in CPU? There are methods for mapping data in GPU but would those be changed and valid?
This is the way I'm thinking about this, though there could be other way, which is unknow to me. I would be glad if someone could explain me this, as I've just read some books about GLSL and now I would like to program more complex shaders, and I wouldn't like to relieve on methods that are impossible at this time.
Thanks

Great question! Welcome to the brave new world of General-Purpose Computing on Graphics Processing Units (GPGPU).
What you want to do is possible with pixel shaders. You load a texture (that is: data), apply a shader (to do the desired computation) and then use Render to Texture to pass the resulting data from the GPU to the main memory (RAM).
There are tools created for this purpose, most notably OpenCL and CUDA. They greatly aid GPGPU so that this sort of programming looks almost as CPU programming.
They do not require any 3D graphics experience (although still preferred :) ). You don't need to do tricks with textures, you just load arrays into the GPU memory. Processing algorithms are written in a slightly modified version of C. The latest version of CUDA supports C++.
I recommend to start with CUDA, since it is the most mature one: http://www.nvidia.com/object/cuda_home_new.html

This is easily possible on modern graphics cards using either Open CL, Microsoft Direct Compute (part of DirectX 11) or CUDA. The normal shader languages are utilized (GLSL, HLSL for example). The first two work on both Nvidia and ATI graphics cards, cuda is nvidia exclusive.
These are special libaries for computing stuff on the graphics card. I wouldn't use a normal 3D API for this, althought it is possible with some workarounds.

Now you can use shader buffer objects in OpenGL to write values in shaders that can be read in host.

My best guess would be to send you to BehaveRT which is a library created to harness GPUs for behavorial models. I think that if you can formulate your modifications in the library, you could benefit from its abstraction
About the data passing back and forth between your cpu and gpu, i'll let you browse the documentation, i'm not sure about it

Does GLSL utilize SLI? Does OpenCL? What is better, GLSL or OpenCL for multiple GPUs?

To what extend does OpenGL's GLSL utilize SLI setups? Is it utilized at all at the point of execution or only for end rendering?
Similarly, I know that OpenCL is alien to SLI but assuming one has several GPUs, how does it compare to GLSL in multiprocessing?
Since it might depend on the application, e.g. common transformation, or ray tracing, can you offer insight on differences depending on application type?

The goal of SLI is to divide the rendering workload on several GPU. First, the graphic driver uses a either a Sort-first or time decomposition (GPU0 works on frame n while GPU1 works on frame n+1) approach. And then, the pixels are copied from one GPU to the other.
That said, SLI has nothing to do with the shading language used by OpenGL (the way the pixels are drawn doesn't really matter).
For OpenCL, I would say that you have to divide your workload between the GPU by yourself, but I am not sure.

If you want to take advantage of multiple GPUs with OpenCL, you will have to create command queues for each device and run kernels on each device after splitting up the workload.

See http://developer.nvidia.com/object/sli_best_practices.html
Basically, you have to instruct the driver that you want to use SLI, and in which mode. After this, the driver will (almost) seamlessly do all the work for you.
Alternate Frame Rendering : no sync needed, so better performance, but more lag
Split Frame Rendering : lots of sync, some vertices are processed twice, but less lag.
For you GLSL vs OpenCL comparison, I don't know of any good benchmark. I'd be interested, though.

How to speed up offscreen OpenGL rendering with large textures on Win32?

I'm developing some C++ code that can do some fancy 3D transition effects between two images, for which I thought OpenGL would be the best option.
I start with a DIB section and set it up for OpenGL, and I create two textures from input images.
Then for each frame I draw just two OpenGL quads, with the corresponding image texture.
The DIB content is then saved to file.
For example one effect is to locate the two quads (in 3d space) like two billboards, one in front of the other(obscuring it), and then swoop the camera up, forward and down so you can see the second one.
My input images are 1024x768 or so and it takes a really long time to render (100 milliseconds) when the quads cover most of the view. It speeds up if the camera is far away.
I tried rendering each image quad as hundreds of individual tiles, but it takes just the same time, it seems like it depends on the number of visible textured pixels.
I assumed OpenGL could do zillions of polygons a second. Is there something I am missing here?
Would I be better off using some other approach?
Thanks in advance...
Edit :
The GL strings show up for the DIB version as :
Vendor : Microsoft Corporation
Version: 1.1.0
Renderer : GDI Generic
The Onscreen version shows :
Vendor : ATI Technologies Inc.
Version : 3.2.9756 Compatibility Profile Context
Renderer : ATI Mobility Radeon HD 3400 Series
So I guess I'll have to use FBO's , I'm a bit confused as to how to get the rendered data out from the FBO onto a DIB, any pointers (pun intended) on that?

It sounds like rendering to a DIB is forcing the rendering to happen in software. I'd render to a frame buffer object, and then extract the data from the generated texture. Gamedev.net has a pretty decent tutorial.
Keep in mind, however, that graphics hardware is oriented primarily toward drawing on the screen. Capturing rendered data will usually be slower that displaying it, even when you do get the hardware to do the rendering -- though it should still be quite a bit faster than software rendering.
Edit: Dominik Göddeke has a tutorial that includes code for reading back texture data to CPU address space.

One problem with your question:
You provided no actual rendering/texture generation code.
Would I be better off using some other approach?
The simplest thing you can do is to make sure your textures have sizes equal to power of two. I.e. instead of 1024x768 use 1024x1024, and use only part of that texture. Explanation: although most of modern hardware supports non-pow2 textures, they are sometimes treated as "special case", and using such texture MAY produce performance drop on some hardware.
I assumed OpenGL could do zillions of polygons a second. Is there something I am missing here?
Yes, you're missing one important thing. There are few things that limit GPU performance:
1. System memory to video memory transfer rate (probably not your case - only for dynamic textures\geometry when data changes every frame).
2. Computation cost. (If you write a shader with heavy computations, it will be slow).
3. Fill rate (how many pixels program can put on screen per second), AFAIK depends on memory speed on modern GPUs.
4. Vertex processing rate (not your case) - how many vertices GPU can process per second.
5. Texture read rate (how many texels per second GPU can read), on modern GPUs depends on GPU memory speed.
6. Texture read caching (not your case) - i.e. in fragment shader you can read texture few hundreds times per pixel with little performance drop IF coordinates are very close to each other (i.e. almost same texel in each read) - because results are cached. But performance will drop significantly if you'll try to access 100 randomly located texels for every pixels.
All those characteristics are hardware dependent.
I.e., depending on some hardware you may be able to render 1500000 polygons per frame (if they take a small amount of screen space), but you can bring fps to knees with 100 polygons if each polygon fills entire screen, uses alpha-blending and is textured with a highly-detailed texture.
If you think about it, you may notice that there are a lot of videocards that can draw a landscape, but fps drops when you're doing framebuffer effects (like blur, HDR, etc).
Also, you may get performance drop with textured surfaces if you have built-in GPU. When I fried PCIEE slot on previous motherboard, I had to work with built-in GPU (NVidia 6800 or something). Results weren't pleasant. While GPU supported shader model 3.0 and could use relatively computationally expensive shaders, fps rapidly dropped each time when there was a textured object on screen. Obviously happened because built-in GPU used part of system memory as video memory, and transfer rates in "normal" GPU memory and system memory are different.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js