As we know, glReadPixels() will block the pipeline and use CPU to convert data format, especially when I want to read depth value out to system RAM.
I tried PBO provided by Songho, but I found it was only useful when param of glReadPixels() was set to GL_BGRA.
When I use PBO with param GL_BGRA, the read time is almost 0.1ms and CPU usage is 4%.
When I change param to GL_RGBA, it reads 2ms with CPU usage 50%.
It is the same when I try GL_DEPTH_COMPONENT. Apparently the slowness is caused by converting, so any one knows how to stop it converting data format?
In my program, I have to read depth value and calculate 16*25 times in less one second, so 2ms is not acceptable.
so any one knows how to stop it converting data format?
D'uh, by reading a data format that does not need converting. On-screen framebuffers are typically formated as BGRA and if you want something different the data needs to be converted first.
You could use a FBO with texture/renderbuffer attachments that are in the format expected and render to that.
Desktop OpenGL will give you the data in whatever format you want, so unless you specify the format that doesn't require conversion, it will convert it for you. Because that's what you asked for.
Given an implementation that supports ARB_internalformat_query2 (just NVIDIA right now), you can simply ask. You ask for the GL_READ_PIXELS_FORMAT and GL_READ_PIXELS_TYPE, and then use those. It should return a format that doesn't require conversion.
Related
I am trying to use vkCreateImage with a 3-component image (rgb).
But all the the rgb formats give:
vkCreateImage format parameter (VK_FORMAT_R8G8B8_xxxx) is an unsupported format
Does this mean that I have to reshape the data in memory? So add an empty byte after each 3, and then load it as RGBA?
I also noticed R8 and R8G8 formats do work, so I would guess the only reason RGB is not supported because 3 is not a power of two.
Before I actually do this reshaping of the data I'd like to know for sure that this is the only way, because it is not very good for performance and maybe there is some offset or padding value somewhere that will help loading the RGB data into an RGBA format. So can somebody confirm the reshaping into RGBA is a necessary step to load RGB formats (albeit with 33% overhead)?
Thanks in advance.
First, you're supposed to check to see what is supported before you try to create an image. You shouldn't rely on validation layers to stop you; that's just a debugging aid to catch something when you forgot to check. What is and is not supported is dynamic, not static. It's based on your implementation. So you have to ask every time your application starts whether the formats you intend to use are available.
And if they are not, then you must plan accordingly.
Second, yes, if your implementation does not support 3-channel formats, then you'll need to emulate them with a 4-channel format. You will have to re-adjust your data to fit your new format.
If you don't like doing that, I'm sure there are image editors you can use to load your image, add an opaque alpha of 1.0, and save it again.
I need a way to get the pixels of an already existing texture. Similarly to how D3DTexture's LockRect works with ReadOnly and NoSysLock. Some of my textures are also stored in compressed DXT1/3/5 formats, not entirely sure if that would affect anything. If those formats are simply decoded by Opengl and stored as raw pixels instead of in the compression. So would retrieving the pixels guarantee the same format that was used to set the texture with?
Generally you will want to use a PBO for reading pixels. Here's all the information you need on PBOs, click here
So would retrieving the pixels guarantee the same format that was used
to set the texture with?
It is possible to convert the format and retrieve the pixels at the same time. Look at the format conversion section on the page I linked.
I have read that compressed textures are not readable and are not color render-able.
Though I have some idea of why its not allowed, can some one explain in little detail.
What exactly does it mean its not readable. I can not read from them in shader using say image Load etc? Or I cant even sample from them?
What does it mean its not render-able to? Is it because user is going to see all garbage anyway, so its not allowed.
I have not tried using compressed textures.
Compressed textures are "readable", by most useful definitions of that term. You can read from them via samplers. However, you can't use imageLoad operations on them. Why? Because reading such memory is not a simple memory fetch. It involves fetching lots of memory and doing a decompression operation.
Compressed images are not color-renderable, which means they cannot be attached to an FBO and used as a render target. One might think the reason for this was obvious, but if you need it spelled out. Writing to a compressed image requires doing image compression on the fly. And most texture compression formats (or compressed formats of any kind) are not designed to easily deal with changing a few values. Not to mention, most compressed texture formats are lossy, so every time you do a decompress/write/recompress operation, you lose image fidelity.
From the OpenGL Wiki:
Despite being color formats, compressed images are not color-renderable, for obvious reasons. Therefore, attaching a compressed image to a framebuffer object will cause that FBO to be incomplete and thus unusable. For similar reasons, no compressed formats can be used as the internal format of renderbuffers.
So "not color render-able" means that they can't be used in FBOs.
I'm not sure what "not readable" means; it may mean that you can't bind them to an FBO and read from the FBO (since you can't bind them to an FBO in the first place).
Does anyone know of an efficient way to push 2vuy non-planar data onto a GPU in a way that doesn't require swizzling?
I am grabbing the raw 2vuy data from an h264 video file and successfully loading it into a texture that I map to an an OpenGL object. I notice that my code spends a fair amount of time in glgProcessPixelsWithProcessor. My glTexImage2D call looks like the following:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, width, height, 0, GL_YCBCR_422_APPLE,
GL_UNSIGNED_SHORT_8_8_APPLE, data);
Apple says in its OpenGL guide that GL_YCBCR_422_APPLE, provides "acceptable" performance (p103), but that
Note: If your data needs only to be swizzled, glgProcessPixels performs the swizzling reasonably fast although not as fast as if the data didn't need swizzling. But non-native data formats are converted one byte at a time and incurs a performance cost that is best to avoid.
I assume that there is some kind of internal format conversion going on the CPU. I noticed in another thread that glgProcessPixels is running a block method as well.
Is my path the most efficient? If not, what is?
Your code, as it stands right now depends on extensions of Apple. I can't tell what's happening inside.
However what I suggest is, that you create three 2D textures, each with exactly one channel, where each texture receives one of the color planes; using independent textures makes supporting chroma subsampling (that 422) simpler.
In a shader you'd then perform the colorspace conversion. When writing down the math I suggest you do this via a contact color space, like XYZ, as this allows you, to take the color profile of the output device into account; ICC profiles provide the conversion data from XYZ color space coordinates to device color space (RGB) coordinates.
I am doing some gpgpu calculations with GL and want to read my results from the framebuffer.
My framebuffer-texture is logically an 1D array, but I made it 2D to have a bigger area. Now I want to read from any arbitrary pixel in the framebuffer-texture with any given length.
That means all calculations are already done on GPU side and I only need to pass certain data to the cpu that could be aligned over the border of the texture.
Is this possible? If yes is it slower/faster than glReadPixels on the whole image and then cutting out what I need?
EDIT
Of course I know about OpenCL/CUDA but they are not desired because I want my program to run out of the box on (almost) any platform.
Also I know that glReadPixels is very slow and one reason might be that it offers some functionality that I do not need (Operating in 2D). Therefore I asked for a more basic function that might be faster.
Reading the whole framebuffer with glReadPixels just to discard it all except for a few pixels/lines would be grossly inefficient. But glReadPixels lets you specify a rect within the framebuffer, so why not just restrict it to fetching the few rows of interest ? So you maybe end up fetching some extra data at the start and end of the first and last lines fetched, but I suspect the overhead of that is minimal compared with making multiple calls.
Possibly writing your data to the framebuffer in tiles and/or using Morton order might help structure it so a tighter bounding box can be be found and the extra data retrieved minimised.
You can use a pixel buffer object (PBO) to transfer pixel data from the framebuffer to the PBO, then use glMapBufferARB to read the data directly:
http://www.songho.ca/opengl/gl_pbo.html