So I'm working on a function that works similarly to OpenGL Profiler on OSX, which allows me to extract information on OpenGL's back buffers and what they currently contains. Due to the nature of the function, I do not have access to the application's variables containing the depth buffer Ids and need to relies on the GL function itself to provide me with this information.
I've already got another function to copy the actual FBO context image into normal GL texture and already successfully extracted normal Draw buffers and save them as image files using a series of glGetIntegerv() in a (sample) function calls below.
But I couldn't seem to find a constant/function that could be used to pull the buffer information (e.g. type, texture id) out of the Depth buffer (and I already look through them a few times), which I'm pretty sure has to be possible, considering it's been done before in other GL profiling tools.
So, this is the first time I find the need to ask a question here, and is wondering if anyone here know if it'd be possible, or if I really need to catch the that value while the application is setting them rather than trying to pull the current value out of GL context?
// ......
GLint savedReadFBO = GL_ZERO;
GLenum savedReadBuffer = GL_BACK;
glGetIntegerv(GL_READ_FRAMEBUFFER_BINDING, &savedReadFBO);
glGetIntegerv(GL_READ_BUFFER, (GLint *) &savedReadBuffer);
// Try to obtain current draw buffer
GLint currentDrawFBO;
GLenum currentDrawBuffer = GL_NONE;
glGetIntegerv(GL_DRAW_FRAMEBUFFER_BINDING, ¤tDrawFBO);
glGetIntegerv(GL_DRAW_BUFFER0, (GLint *) ¤tDrawBuffer);
// We'll temporarily bind the drawbuffer for reading to pull out the current data.
// Bind drawbuffer FBO to readbuffer
if (savedReadFBO != currentDrawFBO)
{
glBindFramebuffer(GL_READ_FRAMEBUFFER, currentDrawFBO);
}
// Bind the read buffer and copy image
glReadBuffer(currentDrawBuffer);
// ....commands to fetch actual buffer content here....
// Restore the old read buffer
glBindFramebuffer(GL_READ_FRAMEBUFFER, savedReadFBO);
glReadBuffer(savedReadBuffer);
// .......
Related
I'm currently trying to get into Vulkan, and I've mostly followed this well-know Vulkan tutorial, all the while trying to integrate it into a framework I built around OpenGL. I'm at the point where I can successfully render an object on the screen, and have the object move around by passing a transformation matrix to a uniform buffer linked to my shader code.
In this tutorial the author is focusing on drawing one object to the screen, which is a good starting point, but I would like to have end code that would look like this:
drawRect(position1, size1, color1);
drawRect(position2, size2, color2);
...
My first try to implement something like this ended up with me submitting the command buffer, which is created an recorded only once at the beginning, once for each object I wanted to render, and making sure to update the uniform data in-between each command buffer submission. This didn't work however, and after some debugging with renderdoc, I realized it was because starting a render pass clears the screen.
If I understand my situation correctly, the only way to achieve what I want would involve re-creating the command buffers every frame:
Record n, the number of time we want to draw something on the screen;
At the end of a frame, allocate n uniform buffers, and fill them with the corresponding data;
Create n descriptor sets to be able to link these uniform buffers with my shader;
Record the command buffer by repeating n times the process of binding a descriptor set using vkCmdBindDescriptorSets and drawing the requested data using vkCmdDrawIndexed.
This seems like a lot of work to do every frame. Is this how I should handle a dynamic number of draw calls ? Or is there some concept I don't know about/got wrong ?
Generally command buffers are actually re-recorded every frame, and Vulkan allows to multithread recording with command pools.
Indirect draws exist: you store data about draw commands (indeces count, instances count, etc.) into a separate buffer, and then the driver reads the data from the buffer when you submit the commands; vkCmdDraw*Indirect requires you to specify number of draw commands at recording time; vkCmdDraw*IndirectCount allows you to store number of draw commands in a buffer as well.
Also i dont see a reason why would you have to re-create uniform buffers, descriptor sets each frame; In fact, as far as I know, Vulkan encourages you to pre-bake things that you can, and descriptor sets are a tool for that.
I am brand new to OpenGL programming, and I know that graphics APIs are notoriously difficult to debug. My question is, I have a txt file with 3D vertex data. After I create the vertex and index buffers, is there some way to see if the data was loaded correctly? As of now, the only way I can think of is to create a shader to display the points, but to do that would involve a lot of math, and I want to make sure that the data is at least loaded correctly before I try to debug it, that way, if there is a problem, I will know whether or not the problem is in my math in my shader, or I didn't initialize the buffers properly.
Edit:
In case you're confused as to what I'm asking, I'm asking if there is some sort of function you can use that will display the buffer data on the GPU?
You can use glGetBufferSubData() to read back the current buffer content.
Another option is glMapBufferRange(), which allows you to obtain a pointer to the buffer content.
If you don't know the current size of the buffer, which you will need for the calls above, you can get it with glGetBufferParameteriv(). For example, for the currently bound array buffer:
GLint bufSize = 0;
glGetBufferParamteriv(GL_ARRAY_BUFFER, GL_BUFFER_SIZE, bufSize);
I'm using libgdx, to create some program. I need used some operation in framebuffer. For this operation I create new framebuffer, after this operation I'm call in framebuffer dispose(). When I create framebuffer 10 time, I have crash program with error: frame buffer couldn't be constructed: incomplete dimensions. I see at code libgdx and see that this is GL_FRAMEBUFFER_INCOMPLETE_DIMENSIONS status of framebuffer. Why is it happened? What am I must to do to fixing this problem?
Code:
if(maskBufferer != null){
maskBufferer.dispose();
}
maskBufferer = new FrameBuffer(Pixmap.Format.RGBA8888, width, height, true);
mask = createMaskImageMask(aspectRatioCrop, maskBufferer);
...
private Texture createMaskImageMask(boolean aspectRatioCrop, FrameBuffer maskBufferer) {
maskBufferer.begin();
Gdx.gl.glClearColor(COLOR_FOR_MASK, COLOR_FOR_MASK, COLOR_FOR_MASK, ALPHA_FOR_MASK);
Gdx.graphics.getGL20().glClear(GL20.GL_COLOR_BUFFER_BIT);
float[] coord = null;
...
PolygonRegion polyReg = new PolygonRegion( new TextureRegion(new Texture(Gdx.files.internal(texturePolygon)) ),
coord);
PolygonSprite poly = new PolygonSprite(polyReg);
PolygonSpriteBatch polyBatch = new PolygonSpriteBatch();
polyBatch.begin();
poly.draw(polyBatch);
polyBatch.end();
maskBufferer.end();
return maskBufferer.getColorBufferTexture();
}
EDIT
To summarize, GL_FRAMEBUFFER_INCOMPLETE_DIMENSIONS can occur in libgdx when too many FrameBuffer objects are created (without calling .dispose()), possibly to do with OpenGL running out of FBO or texture/renderbuffer handles.
If no handle is returned with glGenFrameBuffers then an FBO won't be bound when attaching targets or checking the status. Likewise an attempt to attach (from a failed call to glGenTextures) an invalid target will cause the FBO will be incomplete. Though it seems incorrect to report GL_FRAMEBUFFER_INCOMPLETE_DIMENSIONS in either case.
One possibility may be the call to allocate memory for the target, such as glTexImage2D or glRenderbufferStorage has failed (out of memory). This leaves the dimensions of the target not equal to other targets already successfully attached to the FBO, and could then produce the error.
It's pretty standard to create a framebuffer once, attach your render targets and reuse each frame. By dispose do you mean glDeleteFrameBuffers?
It looks like there should be delete maskBufferer; after and as well as maskBufferer.dispose();. EDIT if it were C++
Given this error happens after a number of frames it could be many things. Double check you aren't creating framebuffers or attachments each frame, not deleting them and running out of objects/handles.
It also looks like GL_FRAMEBUFFER_INCOMPLETE_DIMENSIONS is no longer used (see the specs), something along the lines of the ability to have mixed dimensions now. Seems like it'd be worth checking that your attachments are all the same size though.
A quick way to narrow down which attachment is causing issues is to comment out half of them and see when the error occurs (or check the status after each attach).
I'm resolved problem. Dispose was helped. I was re-create the class every time, because of dispose was never call.
My application is going to take the rendered results from openGL (both depth map and the rendered 2D image information)
to CUDA for processing.
One way I did is to retrieve image/depth map by glReadPixel(..., image_array_HOST/depth_array_Host)*, and then pass image_HOST/depth_HOST to CUDA
by cudaMemcpy(..., cudaMemcpyHostToDevice). I have done this part, although it sounds redundant. (from GPU>CPU>GPU).
*image_array_HOST/depth_array_Host are array I define on host.
Another way is to use openGL<>cuda interpol.
First step is to create one buffer in openGL, and then pass image/depth information to that pixel buffer.
Also one cuda token is registered and linked to that buffer. And then link the matrix on CUDA to that cuda token.
(as far as I know, seems there is no a direct way to link pixel buffer to cuda matrix, there should be a cudatoken for openGL to recognize. Please, correct me if I ma wrong.)
I have also done this part. It thought it should be fairly efficicent becasue the data CUDA is processing was
not transferred to anywhere, but just at where it is located on openGL. It is a data processing inside the device(GPU).
However, the spent time I got from the 2nd method is even (slightly) longerr than the first one (GPU>CPU>GPU).
That really confuses me.
I am not sure if I missed any part, or maybe I didn't do it in an efficient way.
One thing I am also not sure is glReadPixel(...,*data).
In my understanding, if *data is a pointer linking to memory on HOST, then it will do the data transferring from GPU>CPU.
If *data=0, and one buffer is bind, then the data will be transferred to that buffer, and it should be a GPU>GPU thing.
Maybe some other method can pass the data more efficiently then glReadPixel(..,0).
Hope some people can explain my question.
Following is my code:
--
// openGL has finished its rendering, and the data are all save in the openGL. It is ready to go.
...
// declare one pointer and memory location on cuda for later use.
float *depth_map_Device;
cudaMalloc((void**) &depth_map_Device, sizeof(float) * size);
// inititate cuda<>openGL
cudaGLSetGLDevice(0);
// generate a buffer, and link the cuda token to it -- buffer <>cuda token
GLuint gl_pbo;
cudaGraphicsResource_t cudaToken;
size_t data_size = sizeof(float)*number_data; // number_data is defined beforehand
void *data = malloc(data_size);
glGenBuffers(1, &gl_pbo);
glBindBuffer(GL_ARRAY_BUFFER, gl_pbo);
glBufferData(GL_ARRAY_BUFFER, size, data, GL_DYNAMIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, 0);
cudaGraphicsGLRegisterBuffer(&cudaToken, gl_pbo, cudaGraphicsMapFlagsNone); // now there is a link between gl_buffer and cudaResource
free(data);
// now it start to map(link) the data on buffer to cuda
glBindBuffer(GL_PIXEL_PACK_BUFFER, gl_pbo);
glReadPixels(0, 0, width, height, GL_RED, GL_FLOAT, 0);
// map the rendered data to buffer, since it is glReadPixels(..,0), it should be still fast? (GPU>GPU)
// width & height are defined beforehand. It can be GL_DEPTH_COMPONENT or others as well, just an example here.
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, gl_pbo);
cudaGraphicsMapResources(1, &cudaToken, 0); // let cufaResource which has a link to gl_buffer to the the current CUDA windows
cudaGraphicsResourceGetMappedPointer((void **)&depth_map_Device, &data_size, cudaToken); // transfer data
cudaGraphicsUnmapResources(1, &cudaToken, 0); // unmap it, for the next round
// CUDA kernel
my_kernel <<<block_number, thread_number>>> (...,depth_map_Device,...);
I think I can answer my question partly now, and hope it is useful for some people.
I was binding pbo to a float cuda (GPU) memory, but seems the openGL raw image rendered data is unsigned char format, (following is my supposition) so this data need to be transformed to float and then pass to cuda memory. I think what openGL did is using CPU to do this format transformation, and that is why there is no big difference between with and without using pbo.
By using unsigned char (glreadpixel(..,GL_UNSIGNED_BYTE,0)), binding with pbo is quicker than without using pbo for reading RGB data. And then I pass it do a simple cuda kernel to do the format transformation, which is more efficient than what openGL did. By doing this the speed is much quicker.
However, it doesnt work for depth buffer.
For some reason, reading depth map by glreadpixel (no matter with/without pbo) is slow.
And then, I found two old discussions:
http://www.opengl.org/discussion_boards/showthread.php/153121-Reading-the-Depth-Buffer-Why-so-slow
http://www.opengl.org/discussion_boards/showthread.php/173205-Saving-Restoring-Depth-Buffer-to-from-PBO
They pointed out the format question, and that is exactly what I found for RGB. (unsigned char). But I have tried unsigned char/unsigned short and unsigned int, and float for reading depth buffer, all performance almost the same speed.
So I still have speed problem for reading depth.
Okay , I read everything about PBO here : http://www.opengl.org/wiki/Pixel_Buffer_Object
and there http://www.songho.ca/opengl/gl_pbo.html , but I still have a question and I don't know if I'll get any benefit out of a PBO in my case :
I'm doing video-streaming,currently I have a function copying my data buffers to 3 different textures, and then I'm doing some maths in a fragment shader and I display the texture.
I thought PBO could increase the upload time CPU -> GPU, but here it is, let's say we have this example here taken from the second link above.
glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, pboIds[nextIndex]);
// map the buffer object into client's memory
// Note that glMapBufferARB() causes sync issue.
// If GPU is working with this buffer, glMapBufferARB() will wait(stall)
// for GPU to finish its job. To avoid waiting (stall), you can call
// first glBufferDataARB() with NULL pointer before glMapBufferARB().
// If you do that, the previous data in PBO will be discarded and
// glMapBufferARB() returns a new allocated pointer immediately
// even if GPU is still working with the previous data.
glBufferDataARB(GL_PIXEL_UNPACK_BUFFER_ARB, DATA_SIZE, 0, GL_STREAM_DRAW_ARB);
GLubyte* ptr = (GLubyte*)glMapBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, GL_WRITE_ONLY_ARB);
if(ptr)
{
// update data directly on the mapped buffer
updatePixels(ptr, DATA_SIZE);
glUnmapBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB); // release pointer to mapping buffer
}
// measure the time modifying the mapped buffer
t1.stop();
updateTime = t1.getElapsedTimeInMilliSec();
///////////////////////////////////////////////////
// it is good idea to release PBOs with ID 0 after use.
// Once bound with 0, all pixel operations behave normal ways.
glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, 0);
Well, whatever is the behavior of the updatePixels function , it is still using CPU cycles to copy the data to the mapped buffer isn't it?
So let's say I wanted to use PBO in such a manner, that is, to update my frame pixels to the PBO in a function , and then in the display function to call glTexSubImage2D (which should return immediately)... Would I see any speed-up in term of performance?
I can't see why it would be faster... okay we're not waiting anymore during the glTex* call, but we're waiting during the function that uploads the frame to the PBO, aren't we?
Could someone clear that out for me please?
Thanks
The point about Buffer Objects is, that they can be use asynchronously. You can map a BO and then have some other part of the program update it (think threads, think asynchronous IO) while you can keep issuing OpenGL commands. A typical usage scenario with triple buffered PBOs may look like this:
wait_for_video_frame_load_complete(buffer[k-2])
glUnmapBuffer buffer[k-2]
glTexSubImage2D from buffer[k-2]
buffer[k] = glMapBuffer
start_load_next_video_frame(buffer[k]);
draw_texture
SwapBuffers
This allows your program to do usefull work and even upload data to OpenGL while its also used for rendering