I'm specifying cubemap texture for my skybox in the following way:
glTexImage2D(GL_TEXTURE_CUBE_MAP_POSITIVE_X + 0, 0, GL_RGB, width, height, 0, GL_RGB, GL_UNSIGNED_BYTE, texData(0));
glTexImage2D(GL_TEXTURE_CUBE_MAP_POSITIVE_X + 1, 0, GL_RGB, width, height, 0, GL_RGB, GL_UNSIGNED_BYTE, texData(1));
glTexImage2D(GL_TEXTURE_CUBE_MAP_POSITIVE_X + 2, 0, GL_RGB, width, height, 0, GL_RGB, GL_UNSIGNED_BYTE, texData(2));
glTexImage2D(GL_TEXTURE_CUBE_MAP_POSITIVE_X + 3, 0, GL_RGB, width, height, 0, GL_RGB, GL_UNSIGNED_BYTE, texData(3));
glTexImage2D(GL_TEXTURE_CUBE_MAP_POSITIVE_X + 4, 0, GL_RGB, width, height, 0, GL_RGB, GL_UNSIGNED_BYTE, texData(4));
glTexImage2D(GL_TEXTURE_CUBE_MAP_POSITIVE_X + 5, 0, GL_RGB, width, height, 0, GL_RGB, GL_UNSIGNED_BYTE, texData(5));
texData is an unsigned char* vector.
Using Visual Studio Debugger I found that each line takes about 4ms to run, so using 6 lines to specify the cubemap texture takes about 20-25ms in total. I update this cubemap texture in each iteration of my main loop, and it is slowing down my main loop considerably. I know skyboxes are tradionally static but my application needs the skybox to be updated because I'm creating a 360 video viewer.
Is there another way to specify the cubemap texture that could be faster? I have checked OpenGL's docs already but I don't see a faster way.
UPDATE: I replaced glTexImage2D with glTexSubImage2D for all iterations except the 0th iteration and now the total time taken by 6 glTexSubImage2D lines is under 5ms. This is satisfactory for me but I guess I'll leave the question open because technically there's no answer yet.
glTexImage is slower as every time you call it, it will allocate memory on driver side and copy pixel data from your image from CPU to GPU, which happens over the bus.
On the other hand, glTexSubImage is not allocating memory every time. On first call it allocates memory and holds the pointer to it. Later it just copies directly to the memory via that pointer.
I think depending on filtering flags you mention, OpenGL might be creating different texture levels.
Try using glTexStorage2D with level 1.
Also, try using SOIL library - it has one function with simple API to load cubemap.
One other thing you can try is compressing textures and then profile your program - I am sure this option should give you the best performance.
glTexImage2D is slow because it is copying large amounts of dqta from CPU memory to GPU memory. If the images are coming from a video decoder, they are possibly already in GPU memory. In such a case you might be able to texturize then using OpenGL extensions.
These tend to be platform specific though.
Related
After creating a 2D texture array with
glTexImage3D(GL_TEXTURE_2D_ARRAY, 0, GL_RED, 1024, 1024, 1, 0, GL_RED, GL_UNSIGNED_BYTE, NULL);
I upload image data portion by portion using the function glTexSubImage3D() with
glTexSubImage3D(GL_TEXTURE_2D_ARRAY, 0, 0, 0, 0, 66, 66, 1, GL_RED, GL_UNSIGNED_BYTE, data);
The image gets uploaded but in an incorrect way. It appears to be smeared, as if it's using a different pitch instead of 66 bytes. This is on an NVIDIA card using fairly recent drivers.
Funny enough if I make the image 100 pixels wide instead (but not 99), the upload works correctly. Any idea what might be going wrong?
Found the problem. OpenGL has an initial default pixel alignment of 4, even if you specify that the pixel data format is GL_RED.
By changing the row alignment to 1 byte with
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
the problem goes away.
I am using offscreen rendering to texture for a simple GPU calculation. I am using
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, texSize, texSize, 0, GL_RGBA, GL_FLOAT, nullptr);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, texture, 0);
to allocate storage for the texture and
glReadPixels(0, 0, texSize, texSize, GL_RGBA, GL_FLOAT, data);
to read out the computed data. The problem is that the output from the fragment shader I am interested in is only vec2, so the first two slots of the color attachment are populated and the other two are garbage. I then need to post-process data to only take two out of each four floats, which takes needless cycles and storage.
If it was one value, I'd use GL_RED, if it was three, I'd use GL_RGB in my glReadPixels. But I couldn't find a format that would read two values. I'm only using GL_RGBA for convenience as it seems more natural to take 2 floats out of 2×2 than out of 3.
Is there another way which would read all the resulting vec2 tightly packed? I thought of reading RED only, somehow convincing OpenGL to skip four bytes after each value, and then reading GREEN only into the same array to fill in the gaps. To this end I tried to study about glPixelStore but it does not seem to be for this purpose. Is this, or any other way, even possible?
If you only want to read the RG components of the image, you use a transfer format of GL_RG in your glReadPixels command.
However, that's going to be a slow read unless your image also only stores 2 channels. So your image's internal format should be GL_RG32F.
I am writing an interactive path tracer and I was wondering what is the best way to draw the result on screen in modern GL. I have the result of the rendering stored in a pixel buffer that is updated on each pass (+1 ssp). And I would like to draw it on screen after each pass. I did some searching and people have suggested drawing a textured quad for displaying 2d images. Does that mean I would create a new texture each time I update? And given that my pixels are updated very frequently, is this still a good idea?
You don't need to create an entirely new texture every time you want to update the content. If the size stays the same, you can reserve the storage once, using glTexImage2D() with NULL as the last argument. E.g. for a 512x512 RGBA texture with 8-bit component precision:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, 512, 512, 0,
GL_RGBA, GL_UNSIGNED_BYTE, NULL);
In OpenGL 4.2 and later, you can also use:
glTexStorage2D(GL_TEXTURE_2D, 1, GL_RGBA8, 512, 512);
You can then update all or parts of the texture with glTexSubImage2D(). For example, to update the whole texture following the example above:
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 512, 512,
GL_RGBA, GL_UNSIGNED_BYTE, data);
Of course, if only rectangular part(s) of the texture change each time, you can make the updates more selective by choosing the 2nd to 5th parameter accordingly.
Once your current data is in a texture, you can either draw a textured screen size quad, or copy the texture to the default framebuffer using glBlitFramebuffer(). You should be able to find plenty of sample code for the first option. The code for the second option would look something like this:
// One time during setup.
GLuint readFboId = 0;
glGenFramebuffers(1, &readFboId);
glBindFramebuffer(GL_READ_FRAMEBUFFER, readFboId);
glFramebufferTexture2D(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
GL_TEXTURE_2D, tex, 0);
glBindFramebuffer(GL_READ_FRAMEBUFFER, 0);
// Every time you want to copy the texture to the default framebuffer.
glBindFramebuffer(GL_READ_FRAMEBUFFER, readFboId);
glBlitFramebuffer(0, 0, texWidth, texHeight,
0, 0, winWidth, winHeight,
GL_COLOR_BUFFER_BIT, GL_LINEAR);
glBindFramebuffer(GL_READ_FRAMEBUFFER, 0);
I wish to process an image using glsl. For instance - for each pixel, output its squared value:
(r,g,b)-->(r^2,g^2,b^2). Then I want to read the result into cpu memory using glReadPixels.
This should be simple. However, most glsl examples that I find explain about shaders for image post-processing; thus, their output value already lies in [0,255]. In my example, however, I want to get output values in the range [0^2,255^2]; and I don't want them normalized to [0,255].
The main parts of my code are (after some trials and permutations):
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, width, height, 0, GL_BGR, GL_FLOAT, NULL);
glReadPixels(0, 0, width, height, GL_RGB, GL_FLOAT, data_float);
I don't post my entire code since I think these two lines is where my problem lies.
Edit
Following #Arttu's suggestion, and following this post and this post my code now reads as follows:
glTexImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, GL_RGBA32F_ARB, width, height, 0, GL_RGBA, GL_FLOAT, NULL);
glReadPixels(0, 0, width, height, GL_RGB, GL_FLOAT, data_float);
Still, this does not solve my problem. If I understand correctly - no matter what, my input values get scaled to [0,1] when I insert them. It's up to me to multiply later by 255 or by 255^2...
Using floating-point texture format will keep your values intact without clamping them to any specific range (in this case, within the limits of 16-bit float representation, of course). You didn't specify your OpenGL version, so this assumes 4.3.
You seem to have conflicting format and internalformat. You're specifying internalformat RGBA16F, but format BGR, without the alpha component (glTexImage2D man page). Try the following:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, width, height, 0, GL_BGRA, GL_FLOAT, NULL);
glReadPixels(0, 0, width, height, GL_RGBA, GL_FLOAT, data_float);
On the first line you're specifying a 2D texture with four-component, 16-bit floating point format, and OpenGL will expect the texture data to be in BGRA format. Since you have 0 as the last parameter, you're not specifying any image data. Remember that RGBA16F format gives you half values in your shader, which will be implicitly casted to 32-bit format if you're assigning the values to float or vec* variables.
On the second line, you're downloading image data from the device to data_float, this time in RGBA order.
If this doesn't solve your problem, you'll probably need to include some more code. Also, adding glGetError calls into your code will help you find the call that causes an error. Good luck :)
I want to copy texture1 to texture2.
The most stupid way is copying tex1 data from GPU to CPU, and then copy CPU data to GPU.
The stupid code is as below:
float *data = new float[width*height*4];
glBindTexture(GL_TEXTURE_2D, tex1);
glGetTexImage(GL_TEXTURE_2D, 0, GL_RGBA, GL_FLOAT, data);
glBindTexture(GL_TEXTURE_2D, tex2]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, width, height, 0, GL_RGBA, GL_FLOAT, data);
As I know, it must exist a method that supports data copying from GPU tex to GPU tex without CPU involved. I consider about using FBO that rendering a tex1 quad to tex2. But somehow I think it is still naive. So what is the most efficient way to implement this?
If you have support for OpenGL 4.3, there is the straight-forward glCopyImageSubData for exactly this purpose:
glCopyImageSubData(tex1, GL_TEXTURE_2D, 0, 0, 0, 0,
tex2, GL_TEXTURE_2D, 0, 0, 0, 0,
width, height, 1);
Of course this requires the destination texture to already be allocated with an image of appropriate size and format (using a simple glTexImage2D(..., nullptr), or maybe even better glTexStorage2D if you have GL 4 anyway).
If you don't have that, then rendering one texture into the other using an FBO might still be the best approach. In the end you don't even need to render the source texture. You can just attach both textures to an FBO and blit one color attachment over into the other using glBlitFramebuffer (core since OpenGL 3, or with GL_EXT_framebuffer_blit extension in 2.x, virtually anywhere where FBOs are in the first place):
glBindFramebuffer(GL_FRAMEBUFFER, fbo);
glFramebufferTexture2D(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
GL_TEXTURE_2D, tex1, 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT1,
GL_TEXTURE_2D, tex2, 0);
glDrawBuffer(GL_COLOR_ATTACHMENT1);
glBlitFramebuffer(0, 0, width, height, 0, 0, width, height,
GL_COLOR_BUFFER_BIT, GL_NEAREST);
Of course if you do that multiple times, it might be a good idea to keep this FBO alive. And likewise this also requires the destination texture image to have the appropriate size and format beforehand. Or you could also use Michael's suggestion of only attaching the source texture to the FBO and doing a good old glCopyTex(Sub)Image2D into the destination texture. Needs to be evaluated which performs better (if any).
And if you don't even have that one, then you could still use your approach of reading one texture and writing that data into the other. But instead of using the CPU memory as temporary buffer, use a pixel buffer object (PBO) (core since OpenGL 2.1). You will still have an additional copy, but at least that will (or is likely to be) a GPU-GPU copy.