I'm writing my first ray tracer. I want to make it work in real-time.
I want to use opengl for display.
I want to write my screen to floating point buffer and display the buffer.
What extension and/or buffer type I need?
Thanks in advance!
I'm writing my first ray tracer. I want to make it work in real-time.
Ambitious!
I want to use opengl for display. I want to write my screen to floating point buffer and display the buffer.
OpenGL can read from float buffers directly, e.g.
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, width, height, 0, GL_RGB, GL_FLOAT, data);
But OpenGL may choose any internal format that matches your selection. GL_RGB internal format can be anything that can somehow store RGB data. You can be specific about what you want. For example GL_RGB16 tells OpenGL you want 16 bits resolution per channel. The implementation may choose to use 24 bits per channel, as this allows for 16 bit to be stored. But ultimately the implementation decides, which internal format it will be, based on the constraints you put upon it.
Floating point framebuffers and textures are supported in OpenGL through extensions GL_ARB_texture_float, GLX_ARB_fbconfig_float, WGL_ARB_fbconfig_float, but due to patent issues not all OpenGL implementations implement it (ATI and NVidia do).
Related
I am now using FFMPEG to read a high resolution video (6480*1920) and use opengl to show it
after decoding, I get 3 pointer that point to the Y,U,V.
At first, I use swsscale to convert it rgb and show it, but I find it's too slow. So I directly deal with YUV. My second try is generate 3 one channel texture and convert it to rgb in fragment shader. It is faster, but still cannot achieve 60fps
I find the bottleneck is this function : texture(texy, tex_coord.xy). When the texture is large, it cost a lot of time. So instead of call it 3 times, my idea is to put the YUV in one single texture since a texture can have 4 channel. But I wonder that how can I update a certain channel of a texture.
I try the following code, but it seems do not work. Instead of update a channel, glTexSubImage2D changes the whole texture:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, frame->width, frame->height,0, GL_RED, GL_UNSIGNED_BYTE, Y);
glTexSubImage2D(GL_TEXTURE_2D,0,0,0,frame->width, frame->height, GL_GREEN,U);
glTexSubImage2D(GL_TEXTURE_2D,0,0,0,frame->width, frame->height, GL_BLUE,V);
So how can I use one texture to pass the YUV data ? I also try that gather the YUV data into one array then generate the texture. But it does not help since it need a lot of time to generate that array.
Any good idea?
You're approaching this from the wrong angle, since you don't actually understand what is causing the poor performance in the first place. Yes, texture access is a rather expensive operation. But it is not that expensive; I mean, just think about of the amount of texture data that gets pushed around in modern games at very high frame rates.
The problem is not the channel format of the texture, and it is also not the call of GLSL texture.
Your problem is this:
(…) high resolution video (6480*1920)
Plain and simple the dimensions of the frame are outside the range of what the GPU is comfortable working with. Try breaking down the picture into a set of smaller textures. Using glPixelStorei paramters GL_UNPACK_ROW_LENGTH, GL_UNPACK_SKIP_PIXELS and GL_UNPACK_SKIP_ROWS you can select the rectangle inside your source picture to copy.
You don't have to make several draw calls BTW, just select the texture inside the shader based on the target fragment position or texture coordinate.
Unfortunately OpenGL doesn't offer a convenient function to determine the sweet spot, for most GPUs these days the maximum size in either direction for dense textures is 2048. Go above it and in my experience the performance tanks for dense textures.
Sparse textures are an entirely different chapter, and irrelevant for this problem.
And just for the sake of completeness: I take it, that you don't reinitialize the texture for each and every frame with a call to glTexImage2D. Do that only once at the start of the video, then just update the texture(s).
This question already has answers here:
How to use GLUT/OpenGL to render to a file?
(6 answers)
Closed 9 years ago.
My aim is to render OpenGL scene without a window, directly into a file. The scene may be larger than my screen resolution is.
How can I do this?
I want to be able to choose the render area size to any size, for example 10000x10000, if possible?
It all starts with glReadPixels, which you will use to transfer the pixels stored in a specific buffer on the GPU to the main memory (RAM). As you will notice in the documentation, there is no argument to choose which buffer. As is usual with OpenGL, the current buffer to read from is a state, which you can set with glReadBuffer.
So a very basic offscreen rendering method would be something like the following. I use c++ pseudo code so it will likely contain errors, but should make the general flow clear:
//Before swapping
std::vector<std::uint8_t> data(width*height*4);
glReadBuffer(GL_BACK);
glReadPixels(0,0,width,height,GL_BGRA,GL_UNSIGNED_BYTE,&data[0]);
This will read the current back buffer (usually the buffer you're drawing to). You should call this before swapping the buffers. Note that you can also perfectly read the back buffer with the above method, clear it and draw something totally different before swapping it. Technically you can also read the front buffer, but this is often discouraged as theoretically implementations were allowed to make some optimizations that might make your front buffer contain rubbish.
There are a few drawbacks with this. First of all, we don't really do offscreen rendering do we. We render to the screen buffers and read from those. We can emulate offscreen rendering by never swapping in the back buffer, but it doesn't feel right. Next to that, the front and back buffers are optimized to display pixels, not to read them back. That's where Framebuffer Objects come into play.
Essentially, an FBO lets you create a non-default framebuffer (like the FRONT and BACK buffers) that allow you to draw to a memory buffer instead of the screen buffers. In practice, you can either draw to a texture or to a renderbuffer. The first is optimal when you want to re-use the pixels in OpenGL itself as a texture (e.g. a naive "security camera" in a game), the latter if you just want to render/read-back. With this the code above would become something like this, again pseudo-code, so don't kill me if mistyped or forgot some statements.
//Somewhere at initialization
GLuint fbo, render_buf;
glGenFramebuffers(1,&fbo);
glGenRenderbuffers(1,&render_buf);
glBindRenderbuffer(render_buf);
glRenderbufferStorage(GL_RENDERBUFFER, GL_BGRA8, width, height);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER,fbo);
glFramebufferRenderbuffer(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, render_buf);
//At deinit:
glDeleteFramebuffers(1,&fbo);
glDeleteRenderbuffers(1,&render_buf);
//Before drawing
glBindFramebuffer(GL_DRAW_FRAMEBUFFER,fbo);
//after drawing
std::vector<std::uint8_t> data(width*height*4);
glReadBuffer(GL_COLOR_ATTACHMENT0);
glReadPixels(0,0,width,height,GL_BGRA,GL_UNSIGNED_BYTE,&data[0]);
// Return to onscreen rendering:
glBindFramebuffer(GL_DRAW_FRAMEBUFFER,0);
This is a simple example, in reality you likely also want storage for the depth (and stencil) buffer. You also might want to render to texture, but I'll leave that as an exercise. In any case, you will now perform real offscreen rendering and it might work faster then reading the back buffer.
Finally, you can use pixel buffer objects to make read pixels asynchronous. The problem is that glReadPixels blocks until the pixel data is completely transfered, which may stall your CPU. With PBO's the implementation may return immediately as it controls the buffer anyway. It is only when you map the buffer that the pipeline will block. However, PBO's may be optimized to buffer the data solely on RAM, so this block could take a lot less time. The read pixels code would become something like this:
//Init:
GLuint pbo;
glGenBuffers(1,&pbo);
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo);
glBufferData(GL_PIXEL_PACK_BUFFER, width*height*4, NULL, GL_DYNAMIC_READ);
//Deinit:
glDeleteBuffers(1,&pbo);
//Reading:
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo);
glReadPixels(0,0,width,height,GL_BGRA,GL_UNSIGNED_BYTE,0); // 0 instead of a pointer, it is now an offset in the buffer.
//DO SOME OTHER STUFF (otherwise this is a waste of your time)
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo); //Might not be necessary...
pixel_data = glMapBuffer(GL_PIXEL_PACK_BUFFER, GL_READ_ONLY);
The part in caps is essential. If you just issue a glReadPixels to a PBO, followed by a glMapBuffer of that PBO, you gained nothing but a lot of code. Sure the glReadPixels might return immediately, but now the glMapBuffer will stall because it has to safely map the data from the read buffer to the PBO and to a block of memory in main RAM.
Please also note that I use GL_BGRA everywhere, this is because many graphics cards internally use this as the optimal rendering format (or the GL_BGR version without alpha). It should be the fastest format for pixel transfers like this. I'll try to find the nvidia article I read about this a few monts back.
When using OpenGL ES 2.0, GL_DRAW_FRAMEBUFFER might not be available, you should just use GL_FRAMEBUFFER in that case.
I'll assume that creating a dummy window (you don't render to it; it's just there because the API requires you to make one) that you create your main context into is an acceptable implementation strategy.
Here are your options:
Pixel buffers
A pixel buffer, or pbuffer (which isn't a pixel buffer object), is first and foremost an OpenGL context. Basically, you create a window as normal, then pick a pixel format from wglChoosePixelFormatARB (pbuffer formats must be gotten from here). Then, you call wglCreatePbufferARB, giving it your window's HDC and the pixel buffer format you want to use. Oh, and a width/height; you can query the implementation's maximum width/heights.
The default framebuffer for pbuffer is not visible on the screen, and the max width/height is whatever the hardware wants to let you use. So you can render to it and use glReadPixels to read back from it.
You'll need to share you context with the given context if you have created objects in the window context. Otherwise, you can use the pbuffer context entirely separately. Just don't destroy the window context.
The advantage here is greater implementation support (though most drivers that don't support the alternatives are also old drivers for hardware that's no longer being supported. Or is Intel hardware).
The downsides are these. Pbuffers don't work with core OpenGL contexts. They may work for compatibility, but there is no way to give wglCreatePbufferARB information about OpenGL versions and profiles.
Framebuffer Objects
Framebuffer Objects are more "proper" offscreen rendertargets than pbuffers. FBOs are within a context, while pbuffers are about creating new contexts.
FBOs are just a container for images that you render to. The maximum dimensions that the implementation allows can be queried; you can assume it to be GL_MAX_VIEWPORT_DIMS (make sure an FBO is bound before checking this, as it changes based on whether an FBO is bound).
Since you're not sampling textures from these (you're just reading values back), you should use renderbuffers instead of textures. Their maximum size may be larger than those of textures.
The upside is the ease of use. Rather than have to deal with pixel formats and such, you just pick an appropriate image format for your glRenderbufferStorage call.
The only real downside is the narrower band of hardware that supports them. In general, anything that AMD or NVIDIA makes that they still support (right now, GeForce 6xxx or better [note the number of x's], and any Radeon HD card) will have access to ARB_framebuffer_object or OpenGL 3.0+ (where it's a core feature). Older drivers may only have EXT_framebuffer_object support (which has a few differences). Intel hardware is potluck; even if they claim 3.x or 4.x support, it may still fail due to driver bugs.
If you need to render something that exceeds the maximum FBO size of your GL implementation libtr works pretty well:
The TR (Tile Rendering) library is an OpenGL utility library for doing
tiled rendering. Tiled rendering is a technique for generating large
images in pieces (tiles).
TR is memory efficient; arbitrarily large image files may be generated
without allocating a full-sized image buffer in main memory.
The easiest way is to use something called Frame Buffer Objects (FBO). You will still have to create a window to create an opengl context though (but this window can be hidden).
The easiest way to fulfill your goal is using FBO to do off-screen render. And you don't need to render to texture, then get the teximage. Just render to buffer and use function glReadPixels. This link will be useful. See Framebuffer Object Examples
I'm doing some GPGPU programming with OpenGL.
I want to be able to write all my data to one-dimensional textures with the format GL_R8, so that I basically can treat it as an std:array object.
Then during rendering I would like to be able to set how the GPU should read the image, e.g. "cast" it to 1024x1024 BGRA.
Is this possible?
e.g. what I want to be able to do:
gpu::array<uint8_t> data(GL_R8, width*height*4);
gpu::bind(data, GL_TEXTURE0, gpu::format::bgra, width, height);
Then use a buffer texture. There's no rule (that I know of) that says you can't hook the same buffer up to multiple different textures. That would allow one texture to use it with the GL_R8 internal format. And another texture could use it with the GL_RGBA8 format.
I want to mix two (or more) 16bit audio streams using OpenGL and I need a bit of help
Basically what I want to do is to put the audio data into texture which I draw to a frame buffer object and then read back. This is not a problem, however drawing the data in way that gives correct results is a bit more problematic.
I have basically two questions.
In order to mix the data by drawing i need to use blending (alpha = 0.5), however the result should not have any alpha channel. So if I render to e.g. a frame buffer with the format RGB will alpha blending still work as I expect and the resulting alpha will not be written to the fbo? (I want to avoid having to read back the fbo for each render pass)
texture |sR|sG|sB|
framebuffer(before) |dR|dG|dB|
framebuffer(after)
|dR*0.5+sR*0.5|dG*0.5+sG*0.5|dB*0.5+sB*0.5|
The audio samples are signed 16bit integer values. Is it possible do signed calculations this way? Or will I need to first convert the values to unsigned on the cpu, draw them, and then make them signed again on the cpu?
EDIT:
I was a bit unclear. My hardware is restricted to OpenGL 3.3 hardware. I would prefer to not use CUDA or OpenCL, since I'm alrdy using OpenGL for other stuff.
Each audio sample will be rendered in seperate passes, which means that it has to "mix" with whats already been rendered to the frame buffer. The problem is how the output from the pixel shader is written to the framebuffer (this blending is not accessible through programmable shaders, as far as i know, and one has to use glBlendFunc).
EDIT2:
Each audio sample will be rendered in different passes, so only one audio sample will be available in the shader at a time, which means that they need to be accumulated in the FBO.
foreach(var audio_sample in audio_samples)
draw(audio_sample);
and not
for(int n = 0; n < audio_samples.size(); ++n)
{
glActiveTexture(GL_TEXTURE0 + n);
glBindTexture(audio_sample);
}
draw_everything();
Frankly, why wouldn't you just use programmable pixel shaders for that?
Do you have to use OpenGL 1 Fixed Functionality Pipeline?
I'd just go with a programmable shader operating on signed 16bit grayscale linear textures.
Edit:
foreach(var audio_sample in audio_samples)
blend FBO1 + audio_sample => FBO2
swap FBO2, FBO1
It ought to be just as fast, if not faster (thanks to streaming pipelines).
I agree with QDot. Could you however inform us a bit about the hardware restrictions you are facing? If you have reasonable modern hardware I might even suggest to go the CUDA or OpenCL route, in stead of going through OpenGL.
You should be able to do blending even if the destination buffer does not have alpha. That said, rendering to non-power-of-two sizes (rgb16 = 6bytes/pixel) usually incurs performance penalties.
Signed is not your typical render target format, but it does exist in the OpenGL 4.0 specification (Table 3.12, called RGB16_SNORM or RGB16I, depending on whether you want a normalized representation or not).
As a side note, you also have glBlendFunc(GL_CONSTANT_ALPHA,GL_ONE_MINUS_CONSTANT_ALPHA) to not even have to specify an alpha per-pixel. That may not be available on all GL implementations though.
I have been trying to make a Cross-platform 2D Online Game, and my maps are made of tiles.
My tileset, which I render the tiles from, is quite huge.
I wanted to know how can I disable hardware rendering, or at least making it more capable.
Hence, I wanted to know what are the basic limits of the video ram, as far as I know, Direct3D has a texture size limits (by that I don't mean the power-of-two texture sizes).
If you want to use a software renderer, link against Mesa.
You can get an estimate of the maximum texture size using these methods:
21.130 What's the maximum size texture map my device will render hardware accelerated?
A good OpenGL implementation will render with hardware acceleration whenever possible. However, the implementation is free to not render hardware accelerated. OpenGL doesn't provide a mechanism to ensure that an application is using hardware acceleration, nor to query that it's using hardware acceleration. With this information in mind, the following may still be useful:
You can obtain an estimate of the maximum texture size your implementation supports with the following call:
GLint texSize;
glGetIntegerv(GL_MAX_TEXTURE_SIZE, &texSize);
If your texture isn't hardware accelerated, but still within the size restrictions returned by GL_MAX_TEXTURE_SIZE, it should still render correctly.
This is only an estimate, because the glGet*() function doesn't know what format, internalformat, type, and other parameters you'll be using for any given texture. OpenGL 1.1 and greater solves this problem by allowing texture proxy.
Here's an example of using texture proxy:
glTexImage2D(GL_PROXY_TEXTURE_2D, level, internalFormat, width, height, border, format, type, NULL);
Note the pixels parameter is NULL, because OpenGL doesn't load texel data when the target parameter is GL_PROXY_TEXTURE_2D. Instead, OpenGL merely considers whether it can accommodate a texture of the specified size and description. If the specified texture can't be accommodated, the width and height texture values will be set to zero. After making a texture proxy call, you'll want to query these values as follows:
GLint width;
glGetTexLevelParameteriv(GL_PROXY_TEXTURE_2D, 0, GL_TEXTURE_WIDTH, &width);
if (width==0) {
/* Can't use that texture */
}