Data processing and video generation with OpenGL/CL - opengl

Goal: compensate and visualize a stream of 14-bit data (2D video).
Existing solution: Each sample needs to be compensated for a gain and offset, so it requires one multiplication and one addition. Then I assign a colour to the sample by a look-up table and output a stream of "colours" directly to the display. Everything is done on CPU.
Requirements: I need to be able to dynamically set a look-up table (palette).
It seems obvious to use GPU for such an operation, but I couldn't find any info about how to move from data domain to picture domain with OpenGL. I've thought about using OpenCL for data compensation and image generation and then moving to OpenGL for displaying (or in general: for manipulating picture).
Can you recommend me a good approach for this? Can this all be efficiently achieved just with the OpenGL? How?

Yes, it can be done using only OpenGL.
I would suggest a workflow like the following:
For each frame:
Upload frame from stream to texture memory
Draw a full-screen quad, with texture coordinates from 0,0 to 1,1
In a fragment shader apply for each pixel the appropriate transformation. The lookup table can also be stored in a texture, so you only have to perform a lookup on the appropriate location.
In general: This question is at the moment a little bit too broad to be answered in more detail. For example a stream of 14-bit data could be a lot of things. I assumed for this answer you meant a (2D) video stream.

Related

Applying a 2D heatmap to a 3D view

I currently have implemented an OpenGL 3.3 3D environment renderer rendering a (static) block of terrain, and I've been tasked with adding an overlay of statistical data to it; setting specific pixel colours on the terrain based on data values at each point.
The data in question is effectively supplied in the form of a black box in my C++ code base; I can input an X,Y pair of doubles (in worldspace), and it'll output a data value for that location (the terrain does have a third dimension, but the data is not concerned about that). The data in question is time-varying; on changing the time co-ordinate, the scene is expected to update with the data corresponding to the new co-ordinate.
I have a first implementation; the obvious one, where on creating each vertex the appropriate data value for that location is looked up in the black box and encoded in a dynamic buffer accompanying it, with the buffer updated as the time co-ordinate changes. This works perfectly in itself; it's fast to update, and the data is rendered as expected.
However, it's only got data points per-vertex, with simple interpolation across the polygon, and the question's been raised as to whether it's possible to instead render the data per-pixel.
I'm struggling with this. I can't realistically implement the black box behaviour directly in the shaders; it's a large, complex function that I don't fully understand myself (hence representing it here as a black box!), and it requires referencing multiple data sources. There was a version early on - before I looked into the project - that rendered the entire scene in our (separate, non-OpenGL, 2D), top-down environment renderer at an extremely high resolution and applied that as a texture to the mesh - but that's both cripplingly slow and still not true per-pixel data, you can still zoom to a point where the resolution breaks down.
I'm not currently using deferred rendering, but I'm wondering if I can use similar principles to that. One thing I'm considering currently is whether - during the render process - there's a way I can store worldspace X and Y data per-pixel in a buffer (stencil? G-? Arbitrary render target?), and then - back in the C++ environment - generate an overlay texture per frame based on those accumulated X and Y values - but I'm somewhat put off by the notion that that'd require double-precision, and lots of what I've seen suggests steering clear of any double calculations in GLSL; again, I'm worried about speed (although is a simple passthrough and interpolation of double-precision data less impactful?)... plus I'm not entirely sure that what I'm suggesting is even possible!
I may be overcomplicating this somewhat, though, there may be far simpler solutions that aren't in my frame of reference yet, so I'm curious to hear if there's any suggestions for better solutions, or if it's unrealistic.
(While I'm currently using 3.3, a solution requiring 4+ is not off the table)

Read and Write in one Texture (OpenGL)

I want to store and update informations in a texture. So the idea is, that I create a new texture with current informations. While storing it in the render process I actually want to read the informations out of the same pixel and store a weighted average of both values. So the value that was rendered to that pixel and the value that was already on that pixel.
Now I read very often that I can not read and write on the same texture. Now my questions is, may it maybe be possible? and if not should I copy the texture information, before the rendering step and pass the copy to the shader? If so, how can I copy the texture? or should I do a extra rendering step for copying?
I see two possible options here, depending on the mix equation
Alpha Blending: If the equation used can be mapped to one of the glBlendFunc functions, then this is the way to go. If you want to use linear factors for the stored and the new value this should be possible. This is also the option where I would expect the best performance.
Image Load Store: With this method one can read and write to the same texture at the same time (see here). The performance will usually be very bad here and you will have to use the image atomic operations to ensure that multiple fragments at the same location always read the correct value.
Copying the texture would, in my opinion, only work if you render an image and then perform one weighted average computation on it afterwards (otherwise you would have to copy the texture after each store operation). But if this is the case, one could simple render the result of the average computation to a different texture and completely avoid all the trouble of copying the input data.
If resorting to an extension is an option, you can use NV_texture_barrier which allows writing and reading from the same texture.

Image with sparse and continuous coordinates in ITK

I have a raw data image which is potentially sparse and has continuous coordinates (e.g. 1000 pixels which are positioned on a spiral, the coordinates are floats). What is the best way to load this data into ITK for further processing and the ability to save the image in physical coordinates?
My research so far: There is itk::SpecialCoordinatesImage which I could inherit to override TransformPhysicalPointToContinuousIndex(…) and TransformPhysicalPointToIndex(…). I do not know the position and pixel number before reading the hole data stream. So for a minimal amount of speed I will need to resort the data "manually". Isn't there a better way?
I am more familiar with vtk than itk, so propably what comes into my mind is a bit biased. You could:
load the raw data into a vtk unstructured grid (see for example the function ReadFinancialData in http://vtk.org/gitweb?p=VTK.git;a=blob;f=Examples/Modelling/Cxx/finance.cxx )
then voxelize it to an image. For example. see http://www.vtkjournal.org/browse/publication/713 (I've never used it, I dont' know if it is compatible with the last versions) or http://www.vtk.org/Wiki/VTK/Examples/Cxx/PolyData/PolyDataContourToImageData

What is the most efficient process to push YUV texture data onto a GPU in OpenGL?

Does anyone know of an efficient way to push 2vuy non-planar data onto a GPU in a way that doesn't require swizzling?
I am grabbing the raw 2vuy data from an h264 video file and successfully loading it into a texture that I map to an an OpenGL object. I notice that my code spends a fair amount of time in glgProcessPixelsWithProcessor. My glTexImage2D call looks like the following:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, width, height, 0, GL_YCBCR_422_APPLE,
GL_UNSIGNED_SHORT_8_8_APPLE, data);
Apple says in its OpenGL guide that GL_YCBCR_422_APPLE, provides "acceptable" performance (p103), but that
Note: If your data needs only to be swizzled, glgProcessPixels performs the swizzling reasonably fast although not as fast as if the data didn't need swizzling. But non-native data formats are converted one byte at a time and incurs a performance cost that is best to avoid.
I assume that there is some kind of internal format conversion going on the CPU. I noticed in another thread that glgProcessPixels is running a block method as well.
Is my path the most efficient? If not, what is?
Your code, as it stands right now depends on extensions of Apple. I can't tell what's happening inside.
However what I suggest is, that you create three 2D textures, each with exactly one channel, where each texture receives one of the color planes; using independent textures makes supporting chroma subsampling (that 422) simpler.
In a shader you'd then perform the colorspace conversion. When writing down the math I suggest you do this via a contact color space, like XYZ, as this allows you, to take the color profile of the output device into account; ICC profiles provide the conversion data from XYZ color space coordinates to device color space (RGB) coordinates.

Read Framebuffer-texture like an 1D array

I am doing some gpgpu calculations with GL and want to read my results from the framebuffer.
My framebuffer-texture is logically an 1D array, but I made it 2D to have a bigger area. Now I want to read from any arbitrary pixel in the framebuffer-texture with any given length.
That means all calculations are already done on GPU side and I only need to pass certain data to the cpu that could be aligned over the border of the texture.
Is this possible? If yes is it slower/faster than glReadPixels on the whole image and then cutting out what I need?
EDIT
Of course I know about OpenCL/CUDA but they are not desired because I want my program to run out of the box on (almost) any platform.
Also I know that glReadPixels is very slow and one reason might be that it offers some functionality that I do not need (Operating in 2D). Therefore I asked for a more basic function that might be faster.
Reading the whole framebuffer with glReadPixels just to discard it all except for a few pixels/lines would be grossly inefficient. But glReadPixels lets you specify a rect within the framebuffer, so why not just restrict it to fetching the few rows of interest ? So you maybe end up fetching some extra data at the start and end of the first and last lines fetched, but I suspect the overhead of that is minimal compared with making multiple calls.
Possibly writing your data to the framebuffer in tiles and/or using Morton order might help structure it so a tighter bounding box can be be found and the extra data retrieved minimised.
You can use a pixel buffer object (PBO) to transfer pixel data from the framebuffer to the PBO, then use glMapBufferARB to read the data directly:
http://www.songho.ca/opengl/gl_pbo.html