Does a conversion take place when uploading a texture of GL_ALPHA format (glTexImage2D)? - opengl

The documentation for glTexImage2D says
GL_RED (for GL) / GL_ALPHA (for GL ES). "The GL converts it to floating point and assembles it into an RGBA element by attaching 0 for green and blue, and 1 for alpha. Each component is clamped to the range [0,1]."
I've read through the GL ES specs to see if it specifies whether the GPU memory is actually 32bit vs 8bit, but it seems rather vague. Can anyone confirm whether uploading a texture as GL_RED / GL_ALPHA gets converted from 8bit to 32bit on the GPU?
I'm interested in answers for GL and GL ES.

I've read through the GL ES specs to see if it specifies whether the GPU memory is actually 32bit vs 8bit, but it seems rather vague.
Well, that's what it is. The actual details are left for the actual implementation to decide. Giving such liberties in the specification allows actual implementations to contain optimizations tightly tailored to the target system. For example a certain GPU may cope better with a 10 bits per channel format, so it's then at liberty to convert to such a format.
So it's impossible to say in general, but for a specific implementation (i.e. GPU + driver) a certain format will be likely choosen. Which one depends on GPU and driver.

Following on from what datenwolf has said, I found the following in the "POWERVR SGX
OpenGL ES 2.0 Application Development Recommendations" document:
6.3. Texture Upload
When you upload textures to the OpenGL ES driver via glTexImage2D, the input data is usually in linear scanline
format. Internally, though, POWERVR SGX uses a twiddled layout (i.e.
following a plane-filling curve) to greatly improve memory access
locality when texturing. Because of this different layout uploading
textures will always require a somewhat expensive reformatting
operation, regardless of whether the input pixel format exactly
matches the internal pixel format or not.
For this reason we
recommend that you upload all required textures at application or
level start-up time in order to not cause any framerate dips when
additional textures are uploaded later on.
You should especially
avoid uploading texture data mid-frame to a texture object that has
already been used in the frame.

Related

Using OpenGL to perform video compositing with YUV color format - performance

I have written a C/C++ implementation of what I term a "compositor" (I come from a video background) to composite/overlay video/graphics on the top of a video source. My current compositor implementation is rather naive and there is room for CPU optimization improvements (ex: SIMD, threading, etc).
I've created a high-level diagram of what I am currently doing:
The diagram is self explanatory. Nonetheless, I'll elaborate on some of the constraints:
The main video always comes served in an 8-bit YUV 4:2:2 packed format
The secondary video (optional) will come served in either an 8-bit YUV 4:2:2 or YUVA 4:2:2:4 packed format.
The output from the overlay must come out in an 8-bit YUV 4:2:2 packed format
Some other bits of information:
The number of graphics inputs will vary; it may (or may not) be a constant value.
The colour format of the Graphics can be pinned to either ARGB or YUVA format (ie. I can provide it as you see fit). At the moment, I pin it to YUVA to keep a consistent colour format.
The potential of using OpenGL and accompanying shaders is rather appealing:
No need to reinvent the wheel (in terms of actually performing the composition)
The possibility of using GPU where available.
My concern with using OpenGL is performance. Looking around on the web, it is my understanding that a YUV surface would be converted to RGB internally; I would like to minimize the number of colour format conversions and ensure optimal performance. Without prior OpenGL experience, I hope someone can shed some light and suggest if I'm about to venture down the wrong path.
Perhaps my concern relating to performance is less of an issue when using a dedicated GPU? Do I need to consider separate code paths:
Hardware with GPU(s)
Hardware with only CPU(s)?
Additionally, am I going to struggle when I need to process 10-bit YUV?
You should be able to treat YUV as independent channels throughout. OpenGL shaders will be calling them r, g, and b, but it's just data that can be treated as whatever you want.
Most GPUs will support 10 bits per channel (+ 2 alpha bits). Various will support 16 bits per channel for all 4 channels but I'm a little rusty here so I have no idea how common support is for this. Not sure about the 4:2:2 data, but you can always treat it as 3 separate surfaces.
The number of graphics inputs will vary; it may (or may not) be a constant value.
This is something I'm a little less sure about. Shaders like this to be predictable. If your implementation allows you to add each input iteratively then you should be fine.
As an alternative suggestion, have you looked into OpenCL?

What is the difference between clearing the framebuffer using glClear and simply drawing a rectangle to clear the framebuffer?

I think at least some old graphics drivers used to crash if glClear wasn't used and that glClear is probably faster in a lot of cases but why? How are 3-d graphics drivers usually implemented such that these uses would have different results?
On a high level, it can be faster because the OpenGL implementation knows ahead of time that the whole buffer needs to be set to the same color/value. The more you know about what exactly needs to be done, the more you can take advantage of possible accelerations.
Let's say setting a whole buffer to the same value is more efficient than setting the same pixels to variable values. With a glClear(), you know already that all pixels will have the same value. If you draw a screen sized quad with a fragment shader that emits a constant color, the driver would either have to recognize that situation by analyzing the shaders, or the system would have to compare the values coming out of the shader, to know that all pixels have the same value.
The reason why setting everything to the same value can be more efficient has to do with framebuffer compression and related technologies. GPUs often don't actually write each pixel out to the framebuffer, but use various kinds of compression schemes to reduce the memory bandwidth needed for framebuffer writes. If you imagine almost any kind of compression, all pixels having the same value is very favorable.
To give you some ideas about the published vendor specific technologies, here are a few sources. You can probably find more with a search.
Article talking about new framebuffer compression method in relatively recent AMD cards: http://techreport.com/review/26997/amd-radeon-r9-285-graphics-card-reviewed/2.
NVIDIA patent on zero bandwidth clears: http://www.google.com/patents/US8330766.
Blurb on ARM web site about Mali framebuffer compression: http://www.arm.com/products/multimedia/mali-technologies/arm-frame-buffer-compression.php.
Why is it faster? Because it is a function that bypasses most calculations that other types of drawings have to go through.
Alpha function, blend function, logical operation, stenciling, texture mapping, and depth-buffering are ignored by glClear
Source
Why do some drivers crash without it? It's hard to say, but it should have something to do with the implementation details of OpenGL. The functions does what it's supposed to do, but might do more that you don't know about.
OpenGL might infer from this function call other tasks that it needs to perform.

sRGB correction for textures in OpenGL on iOS

I am experiencing the issue described in this article where the second color ramp is effectively being gamma-corrected twice, resulting in overbright and washed-out colors. This is in part a result of my using an sRGB framebuffer, but that is not the actual reason for the problem.
I'm testing textures in my test app on iOS8, and in particular I am currently using a PNG image file and using GLKTextureLoader to load it in as a cubemap.
By default, textures are treated NOT as being in sRGB space (which they are invariably saved in by the image editing software used to build the texture).
The consequence of this is that Apple has made GLKTextureLoader do the glTexImage2D call for you, and they invariably are calling it with the GL_RGB8 setting, whereas for actual correctness in future color operations we have to uncorrect the gamma in order to obtain linear brightness values in our textures for our shaders to sample.
Now I can actually see the argument that it is not required of most mobile applications to be pedantic about color operations and color correctness as applied to advanced 3D techniques involving color blending. Part of the issue is that it's unrealistic to use the precious shared device RAM to store textures at any bit depth greater than 8 bits per channel, and if we read our JPG/PNG/TGA/TIFF and gamma-uncorrect its 8 bits of sRGB into 8 bits linear, we're going to degrade quality.
So the process for most apps is just happily toss linear color correctness out the window, and just ignore gamma correction anyway and do blending in the SRGB space. This suits Angry Birds very well, as it is a game that has no shading or blending, so it's perfectly sensible to do all operations in gamma-corrected color space.
So this brings me to the problem that I have now. I need to use EXT_sRGB and GLKit makes it easy for me to set up an sRGB framebuffer, and this works great on last-3-or-so-generation devices that are running iOS 7 or later. In doing this I address the dark and unnatural shadow appearance of an uncorrected render pipeline. This allows my lambertian and blinn-phong stuff to actually look good. It lets me store sRGB in render buffers so I can do post-processing passes while leveraging the improved perceptual color resolution provided by storing the buffers in this color space.
But the problem now as I start working with textures is that it seems like I can't even use GLKTextureLoader as it was intended, as I just get a mysterious error (code 18) when I set the options flag for SRGB (GLKTextureLoaderSRGB). And it's impossible to debug as there's no source code to go with it.
So I was thinking I could go build my texture loading pipeline back up with glTexImage2D and use GL_SRGB8 to specify that I want to gamma-uncorrect my textures before I sample them in the shader. However a quick look at GL ES 2.0 docs reveals that GL ES 2.0 is not even sRGB-aware.
At last I find the EXT_sRGB spec, which says
Add Section 3.7.14, sRGB Texture Color Conversion
If the currently bound texture's internal format is one of SRGB_EXT or
SRGB_ALPHA_EXT the red, green, and blue components are converted from an
sRGB color space to a linear color space as part of filtering described in
sections 3.7.7 and 3.7.8. Any alpha component is left unchanged. Ideally,
implementations should perform this color conversion on each sample prior
to filtering but implementations are allowed to perform this conversion
after filtering (though this post-filtering approach is inferior to
converting from sRGB prior to filtering).
The conversion from an sRGB encoded component, cs, to a linear component,
cl, is as follows.
{ cs / 12.92, cs <= 0.04045
cl = {
{ ((cs + 0.055)/1.055)^2.4, cs > 0.04045
Assume cs is the sRGB component in the range [0,1]."
Since I've never dug this deep when implementing a game engine for desktop hardware (which I would expect color resolution considerations to be essentially moot when using render buffers of 16 bit depth per channel or higher) my understanding of how this works is unclear, but this paragraph does go some way toward reassuring me that I can have my cake and eat it too with respect to retaining all 8 bits of color information if I am to load in the textures using SRGB_EXT image storage format.
Here in OpenGL ES 2.0 with this extension I can use SRGB_EXT or SRGB_ALPHA_EXT rather than the analogous SRGB or SRGB8_ALPHA from vanilla GL.
My apologies for not presenting a simple answerable question. Let it be this one: Am I barking up the wrong tree here or are my assumptions more or less correct? Feels like I've been staring at these specs for far too long now. Another way to answer my question is if you can shed some light on the GLKTextureLoader error 18 that I get when I try to set the sRGB option.
It seems like there is yet more reading for me to do as I have to decide whether to start to branch my code to get one codepath that uses GL ES 2.0 with EXT_sRGB, and the other using GL ES 3.0, which certainly looks very promising by comparing the documentation for glTexImage2D with other GL versions and appears closer to OpenGL 4 than the others, so I am really liking that ES 3 will be bringing mobile devices a lot closer to the API used on the desktop.
Am I barking up the wrong tree here or are my assumptions more or less
correct?
Your assumptions are correct. If the GL_EXT_sRGB OpenGL ES extension is supported, both sRGB framebuffers (with automatic conversion from linear to gamma-corrected sRGB) and sRGB texture formats (with automatic conversion from sRGB to linear RGB when sampling from it) are available, so that is definitively the way to go, if you want to work in a linear color space.
I can't help with that GLKit issue, no idea about that.

GPU programming for image processing

I'm working on a project aimed to control a bipad humanoid robot. Unfortunately we have a very limited set of hardware resources (a RB110 board and its mini PCI graphic card). I'm planning to port image processing tasks from CPU to graphic card processor of possible but never done it before... I'm advised to use OpenCV but seems to impossible because our graphic card processor (Volari Z9s) is not supported by framework. Then I found an interesting post on Linux Journal. Author have used OpenGL to process frames retrieved from a v4l device.
I'm a little confused about the relationship between hardware API and OpenGL/OpenCV. In order to utilize a GPU, do the hardware need to be sopported by graphic programming frameworks (OpenGL/OpenCV)? Where can I find such an API?
I googled a lot about my hardware, unfortunately the vendor (XGI Technology) seems to be somehow extinct...
In order to utilize a GPU, do the hardware need to be sopported by graphic programming frameworks (OpenGL/OpenCV)? Where can I find such an API?
OpenCL and OpenGL are both translated to hardware instructions by the GPU driver, so you need a driver for your operating system that supports these frameworks. Most GPU drivers support some version of OpenGL so that should work.
The OpenGL standard is maintained by the Khronos Group and you can find some tutorials at nehe.
How OpenGL works
OpenGL accepts triangles as input and draws them according to the state it has when the draw is issued. Most OpenGL functions are there to change the operations performed by manipulating this state. Image manipulation can be done by loading the input image as a texture and drawing several vertices with the texture active, resulting in a new Image (or more generic a new 2D grid of data).
From version > 2 (or with the right ARB extensions) the operations performed on the image can be controlled with GLSL programs called vertex and fragment shaders (there are more shaders, but these are the oldest). A vertex shader will be called once per vertex, the results of this are interpolated and forwarded to the fragment shader. A fragment shader will be called every time a new fragment(pixel) is written to the result.
Now this is all about reading and writing images, how to use it for object detection?
Use Vertices to span the input texture over the whole viewport. Instead of computing rgb colors and storing them in the result you can write a fragmentshader that computes grayscale images / gradient images and then checks these textures for each pixel if the pixel is in the center of a cycle with a specific size, part of a line or just has a relatively high gradient compared to its surrounding (good feature) or really anithing else you can find a good parallel algorithm for. (haven't done this myself)
The end result has to be read back to the cpu (sometimes you can use shaders to scale the data down before doing this). OpenCL gives it a less Graphics like feel and gives a lot more freedom but is less supported.
First of all You need shader support (GLSL or asm)
Usual way will be rendering full screen quad with your image (texture) and applying fragment shader. It's called Post-Processing And limited with instruction set and another limitations that your hardware has. On basic lvl it allows you to apply simple (single function) on large data set in parallel way that will produce another data set. But branching (if it is supported) is first performance enemy because GPU consist from couple SIMD blocks

What is the most efficient process to push YUV texture data onto a GPU in OpenGL?

Does anyone know of an efficient way to push 2vuy non-planar data onto a GPU in a way that doesn't require swizzling?
I am grabbing the raw 2vuy data from an h264 video file and successfully loading it into a texture that I map to an an OpenGL object. I notice that my code spends a fair amount of time in glgProcessPixelsWithProcessor. My glTexImage2D call looks like the following:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, width, height, 0, GL_YCBCR_422_APPLE,
GL_UNSIGNED_SHORT_8_8_APPLE, data);
Apple says in its OpenGL guide that GL_YCBCR_422_APPLE, provides "acceptable" performance (p103), but that
Note: If your data needs only to be swizzled, glgProcessPixels performs the swizzling reasonably fast although not as fast as if the data didn't need swizzling. But non-native data formats are converted one byte at a time and incurs a performance cost that is best to avoid.
I assume that there is some kind of internal format conversion going on the CPU. I noticed in another thread that glgProcessPixels is running a block method as well.
Is my path the most efficient? If not, what is?
Your code, as it stands right now depends on extensions of Apple. I can't tell what's happening inside.
However what I suggest is, that you create three 2D textures, each with exactly one channel, where each texture receives one of the color planes; using independent textures makes supporting chroma subsampling (that 422) simpler.
In a shader you'd then perform the colorspace conversion. When writing down the math I suggest you do this via a contact color space, like XYZ, as this allows you, to take the color profile of the output device into account; ICC profiles provide the conversion data from XYZ color space coordinates to device color space (RGB) coordinates.