Upload ASTC compression texture use glCompressedTexImage2D cost too much time? - opengl

I am working on an opengl application development. I used the opengl function :glCompressedTexImage2D(), to upload video frame texture in ASTC texture compression format. It works well in mobile phone which the GPU support the opengl extension:GL_KHR_texture_compression_astc_ldr and the compression texture format is:GL_COMPRESSED_RGBA_ASTC_8x8_KHR,the upload time is about 2ms per frame.
I want to porting the application to Windows platform with opengl 4.5 and Nvidia GTX 750 hardware ,find that the upload success, but the upload cost too much time, which is about 200ms~300ms per frame. I look at the hardware database: http://delphigl.de/glcapsviewer/listreports.php ,find that GTX 750 not support GL_KHR_texture_compression_astc_ldr extension. Then I used Intel(R) HD Graphics 530, which support GL_KHR_texture_compression_astc_ldr extension and the upload time is about 2ms per frame.
So I want to know why Nvidia GTX 750 could upload ASTC texture success but cost so much time,is there any way to upload ASTC texture in normal time(2ms per frame) using the Nvidia GTX 750.The Intel(R) HD Graphics 530 could not support complicated 3D application.
Here is the upload code:
glCompressedTexImage2D(GL_TEXTURE_2D,
0,
compressed_data_internal_format,
xsize,
ysize,
0,
n_bytes_to_read,
astc_data_ptr);
GL_CHECK(glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR));
GL_CHECK(glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR));
GL_CHECK(glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT));
GL_CHECK(glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT));

is there any way to upload ASTC texture in normal time(2ms per frame) using the Nvidia GTX 750
If the implementation does not expose the GL_KHR_texture_compression_astc_ldr extension, then the implementation does not support ASTC. And therefore, you cannot upload that data to it, regardless of how much time it takes.
NVIDIA's driver should have errored out when you attempted to allocate texture storage in a format it does not support. But whether it does or not, it makes no sense to optimize erroneous code. Nor does it make sense to look at the timings of erroneous code.
Before you get to optimizations, you need to get code that is supposed to work. And yours should not, not unless that extension is supported.

Been there.. Everything works fine in the mobile platform with ASTC.
But when I test it in the linux ( with a Nvidia Tesla T4 video card ), the glCompressedTexImage2D spend 66ms per frame.
FYI, no glError or render issue.
The strack show the stack call below:
Thread 1 (Thread 0x7fe36885f840 (LWP 22683)):
#0 0x00007fe3619521a4 in ?? () from /lib64/libnvidia-eglcore.so.450.102.04
#1 0x00007fe361971d06 in ?? () from /lib64/libnvidia-eglcore.so.450.102.04
#2 0x00007fe361c7ff23 in ?? () from /lib64/libnvidia-eglcore.so.450.102.04
#3 0x00007fe361f4df01 in ?? () from /lib64/libnvidia-eglcore.so.450.102.04
#4 0x00007fe362010368 in ?? () from /lib64/libnvidia-eglcore.so.450.102.04
#5 0x00007fe362010ec9 in ?? () from /lib64/libnvidia-eglcore.so.450.102.04
#6 0x00007fe361c3814b in ?? () from /lib64/libnvidia-eglcore.so.450.102.04
#7 0x00007fe361c3f4b6 in ?? () from /lib64/libnvidia-eglcore.so.450.102.04
#8 0x0000000000594253 in glCompressedTexImage2D(width=720, height=1280, options=..., internelFormat=37808, bytesToRead=921600, data=0x5ecbe80) at /home/video-dev/Template/NESTImage/header/xxx.hpp:94
It seems that the driver (/lib64/libnvidia-eglcore.so.450.102.04) handle the stuff
Maybe the driver decompress the ASTC in the CPU side but not the GPU.

Related

Pixel-path performance warning: Pixel transfer is synchronized with 3D rendering

I am uploading image data into GL texture asynchronously.
In debug output I am getting these warnings during the rendering:
Source:OpenGL,type: Other, id: 131185, severity: Notification
Message: Buffer detailed info: Buffer object 1 (bound to
GL_PIXEL_UNPACK_BUFFER_ARB, usage hint is GL_DYNAMIC_DRAW) has been
mapped WRITE_ONLY in SYSTEM HEAP memory (fast). Source:OpenGL,type:
Performance, id: 131154, severity: Medium Message: Pixel-path
performance warning: Pixel transfer is synchronized with 3D rendering.
I can't see any wrong usage of PBOs in my case or any errors.So the questions is, if these warnings are safe to discard, or I am actually doing smth wrong.
My code for that part:
//start copuying pixels into PBO from RAM:
mPBOs[mCurrentPBO].Bind(GL_PIXEL_UNPACK_BUFFER);
const uint32_t buffSize = pipe->GetBufferSize();
GLubyte* ptr = (GLubyte*)mPBOs[mCurrentPBO].MapRange(0, buffSize, GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT);
if (ptr)
{
memcpy(ptr, pipe->GetBuffer(), buffSize);
mPBOs[mCurrentPBO].Unmap();
}
//copy pixels from another already full PBO(except of first frame into texture //
mPBOs[1 - mCurrentPBO].Bind(GL_PIXEL_UNPACK_BUFFER);
//mCopyTex is bound to mCopyFBO as attachment
glTextureSubImage2D(mCopyTex->GetHandle(), 0, 0, 0, mClientSize.x, mClientSize.y,
GL_RGBA, GL_UNSIGNED_BYTE, 0);
mCurrentPBO = 1 - mCurrentPBO;
Then I just blit the result to default frame buffer. No rendering of geometry or anything like that.
glBlitNamedFramebuffer(
mCopyFBO,
0,//default FBO id
0,
0,
mViewportSize.x,
mViewportSize.y,
0,
0,
mViewportSize.x,
mViewportSize.y,
GL_COLOR_BUFFER_BIT,
GL_LINEAR);
Running on NVIDIA GTX 960 card.
This performance warning is nividia-specific and it is intended as a hint to tell you that you're not going to use a separate hw transfer queue, which is no wonder since you use a single thread, single GL context model, where both rendering (at least your your blit) and transfer are carried out.
See this nvidia presentation for some details about how nvidia handles this. Page 22 also explains this specific warning. Note that this warnign does not mean that your transfer is not asynchronous. It is still fully asynchronous to the CPU thread. It will just be synchronously processed on the GPU, with respect to the render commands which are in the same command queue, and you're not using the asynchronous copy engine which could do these copies independent from the rendering commands in a separate command queue.
I can't see any wrong usage of PBOs in my case or any errors.So the questions is, if these warnings are safe to discard, or I am actually doing smth wrong.
There is nothing wrong with your PBO usage.
It is not clear if your specific application could even benefit from using a more elaborate separate transfer queue scheme.

Video Memory from ETC2 Texture Compression on OpenGL 4.3

Currently I'm writing a renderer which uses many textures and will fill up my graphics card's video memory (3 Gb for my nVidia GTX 780 Ti). So I pre-compressed all necessary images by using Mali's texture compression tool and integrated my renderer with libktx for loading compressed textures(*.ktx).
The compression works really well. For RGB images(compressed with GL_COMPRESSED_RGB8_ETC2), it reaches 4 bpp consistently and 8 bpp for RGBA ones(GL_COMPRESSED_RGBA8_ETC2_EAC) as stated in the specs. But whenever those compressed images are uploaded onto GPU, they appear as the original sizes (before compression)
I'm loading the compressed textures using:
ktxLoadTextureN(...);
and I can see that inside that function, libktx will call:
glCompressedTexImage2D( GLenum target, GLint level,
GLenum internalformat,
GLsizei width, GLsizei height,
GLint border,
GLsizei imageSize,
const GLvoid * data);
The imageSize parameter in glCompressedTexImage2D(); matches my compressed data size, but after this function is executed, the video memory increases by the decompressed image size.
So my question is: Are compressed textures always decompressed before being uploaded onto GPUs? If so, is there any standardized texture compression format that allows a compressed texture to be decoded on the fly on gpu?
ETC2 and ETC formats are not commonly used by desktop applications. As such, they might not be natively supported by the desktop GPU and/or its driver. However, they are required for GLES 3.0 compatibility, so if your desktop OpenGL driver reports GL_ARB_ES3_compatibility, then it must also support the ETC2 format. Because many developers want to develop GLES 3.0 applications on their desktops to avoid constant deployment and have easier debugging, it is desirable for the driver to report this extension.
It is likely that your driver is merely emulating support for the ETC2 format, by decompressing the data in software to an uncompressed RGB(A) target. This would explain the unchanged memory usage from uncompressed textures. This isn't necessarily true for every desktop driver, but likely true for most. It is still compliant with the spec - although it's assumed, there is no requirement that compressed textures consume the memory passed into glCompressedTexImage2D.
If you want to emulate the same level of memory usage on your desktop, you should compress your texture to a commonly used desktop compressed format, such as one of the S3TC formats, using the GL_texture_compression_s3tc extension, which should be available on all desktop GPU drivers.

KTX vs DDS images in OpenGL

I used DDS (DXT5)till now for fast load of texture data.Now,I read that since OpenGL 4.3 (and for ES2) the compressed standard is KTX(ETC1/ETC2).I integrated the Khronos libktx SDK and bench-marked.
Updating texture using glCompressedTexSubImage2D for 3000 times the results are:
DDS:
1450 - millisecond
KTX - forever....
Actually, running a loop of only 300 times updating KTX, the total time already reaches 24 seconds!
Now I have 2 questions:
Is this the expected speed of KTX?
if the answer to the first question is "YES" then what is the advantage of ETC except of smaller file size than that of DDS?
I use OpenGL 4.3 with Quadro4000 GPU.
I asked this question on Khronos KTX forum.Here is the answer I got from the forum moderator:
I have been told by the NVIDIA OpenGL driver team that the Quadro 4000
does not support ETC in hardware while it does support DXTC. This
means the ETC-compressed images will be decompressed by the OpenGL
driver in software then loaded into GPU memory while the
DXTC-compressed images will simply be loaded into GPU memory. I
believe that is the source of the performance difference you are
observing.
So it seems like my card's hardware doesn't support ETC.

intel hd graphics- framebuffer not complete?

I have an application which I usually run on Nvidia graphics card. I thought I'd try running it on the Sandy Bridge Intel HD Graphics 3000.
However, when I'm running on the intel hardware I get an "framebuffer not complete" from the following initialization code:
glGenFramebuffers(1, &fbo_);
glBindFramebuffer(GL_FRAMEBUFFER_EXT, fbo_);
glReadBuffer(GL_COLOR_ATTACHMENT0_EXT);
glDisable(GL_MULTISAMPLE_ARB);
// Error: "the object bound to FRAMEBUFFER_BINDING_EXT is not "framebuffer complete"
Any ideas why?
You kinda need at least one color attachment (before OpenGL 4.3 at least).
More info.

How to use texture compression in openGL?

I'm making an image viewer using openGL and I've run into a situation where I need to load very large (>50MB) images to be viewed. I'm loading the images as textures and displaying them to a GL_QUAD which has been working great for smaller images, but on the large images the loading fails and I get a blank rectangle. So far I've implemented a very ugly hack that uses another program to convert the images to smaller, lower resolution versions which can be loaded, but I'm looking for a more elegant solution. I've found that openGL has a texture compression feature but I can't get it to work. When I call
glTexImage2D(GL_TEXTURE_2D, 0, GL_COMPRESSED_RGBA_ARB, t.width(), t.height(), 0, GL_RGBA, GL_UNSIGNED_BYTE, t.bits());
I get the compiler error "GL_COMPRESSED_RGBA_ARB undeclared". What am I doing wrong? Is there a library I'm missing? And more generally, is this a viable solution to my problem?
I'm using Qt Creator on a Windows Vista machine, with a NVIDIA Quadro FX 1700 graphics card.
On my own GFX card the maximum resolution for an opengl texture is 8192x8192, if your image is bigger then 50MB, it is propably a very very high resolution...
Check http://www.opengl.org/resources/faq/technical/texture.htm , it describes how you can find the maximum texture size.
First, I'd have to ask what resolution are these large images? Secondly, to use a define such as GL_COMPRESSED_RGBA_ARB, you would need to download and use something like GLEW which is more modernized in the GL api than the standard MS-Dev install.