I'm trying to find a clever way to render a large spectrogram (say, fullscreen). A spectrogram is a coordinate-system, where the x-axis is time, the y-axis is frequency and the colour intensity is the magnitude of the frequency component, and it looks like this (youtube).
What's interesting to note is that each frame, a new column (1 pixel wide) is new, but the whole rest of the spectrum is the same, only shifted left one pixel. Currently I'm just writing to a circular software buffer acting like an image, and drawing that - but it is obviously slow at high framerates and screensizes.
Is there any obvious solution to this problem, using OpenGL (or some software trick - has to be cross-platform, though)? Perhaps through some use of buffer on the GPU memory, with a shader that fills it (admittedly, i have a very vague understanding of OpenGL beyond drawing simple stuff)? It revolves around keeping the old data on the GPU memory as i see it.
Use a single channel texture for the waterfall (this is what you're drawing, a waterfall plot) in which you update one column or row at a time using glTexSubImage. By using GL_WRAP mode you can simply advance the texture coordinates beyond the bounds of the texture and it will, well, wrap. By moving the texture opposing to the update you can get the waterfall effect (i.e. moving spectrogram, with the updates coming in at the right edge).
To give the whole thing color, use the texture's values as an index into a transfer function LUT texture using a fragment shader.
You can use the GPU library for spectrogram calculations: nnAudio
https://github.com/KinWaiCheuk/nnAudio
Related
Currently I'm creating a particle system and I would like to transfer most of the work to the GPU using OpenGL, for gaining experience and performance reasons. At the moment, there are multiple particles scattered through the space (these are currently still created on the CPU). I would more or less like to create a histogram of them. If I understand correctly, for this I would first translate all the particles from world coordinates to screen coordinates in a vertex shader. However, now I want to do the following:
So, for each pixel a hit count of how many particles are inside. Each particle will also have several properties (e.g. a colour) and I would like to sum them for every pixel (as shown in the lower-right corner). Would this be possible using OpenGL? If so, how?
The best tool I recomend for having the whole data (if it fits on GPU memory) is the use of SSBO.
Nevertheless, you need data after transforming them (e.g. by a projection). Still SSBO is your best option:
In the fragment shader you read the properties of already handled particles (let's say, the rendered pixel) and write modified properties (number of particles at this pixel, color, etc) to the same index in the buffer.
Due to parallel nature of GPU, several instances coming from different particles may be doing concurrently the work for the same index. Thus you need to handle this on your own. Read Memory model and Atomic operations
Another approach, but limited, is using Blending
The idea is that each fragment increments the actual color value of the frame buffer. This can be done using GL_FUNC_ADD for glBlendEquationSeparate and using as fragment-output-color a value of 1/255 (normalized integer) for each RGB/a component.
Limitations come from the [0-255] range: Only up to 255 particles in the same pixel, the rest amount is clamped to this range and so "lost".
You have four components RGBA, thus four properties can be handled. But can have several renderbuffers in a FBO.
You can read the FBO by glReadPixels. Use glReadBuffer first with a GL_COLOR_ATTACHMENTi if you use a FBO instead of the default frame buffer.
I'm currently implementing automatically adapting exposure for use with HDR in OpenGL. For this I need to retrieve the average brightness of all pixels in the previous frame.
I've not managed to find any solid explanations of how to do this. As far as I can see there are two ways to go about it.
Use glReadPixels to copy the framebuffer to memory and average them on the CPU. This is likely to be painfully slow and doesn't make good use of the GPU.
Take the frame and render it to successively smaller FBOs using linear filtering. This lets the GPU do most of the work but it's going to require a lot of FBOs (roughly 10 for a 1080p screen).
There has got to be a better way of getting average scene brightness. Does anyone have any suggestions?
There are two options that come into my mind:
Using glGenerateMipmap, which calculates the average of a 2x2 window, leaving you with the average scene brightness at the smallest level. This can be retrieved using textureLod function in a shader. Since each mipmap level has half the size of the previous one, the correct level will be log2(max), where max is the returned value of GL_MAX_TEXTURE_SIZE.
Using compute shaders to do basically the same thing glGenerateMipmap does, but with a bigger window size, which could potentially be faster (although I never tested this).
Your Option 2 is not much different from using glGenerateMipmap on the texture, just that you don't need to hassle with any client side objects like FBOs. So basically, rendering to mipmap level 0 of the texture, letting the GL generate the mipmap pyramid, and reading back just the highest level 1x1 image is probably the easiest way to get some approximation of the average color value.
I've been reading various articles about how to write a GPU voxelizer. From my understanding the process goes like this:
Inspect the triangles individually and decide the axis that displays the triangle in the largest way. Call this the dominant axis.
Render the triangle on its dominant axis and sample the texels that come out.
Write that texel data onto a 3D texture and then do what you will with the data
Disregarding conservative rasterization, I have a lot of questions regarding this process.
I've gotten as far as rendering each triangle, choosing a dominant axis and orthogonally projecting it. What should the values of the orthogonal projection be? Should it be some value based around the size of the voxels or how large of an area the map should cover?
What am I supposed to do in the fragment shader? How do I write to my 3D texture such that it stores the voxel data? From my understanding, due to choosing the dominant axis we can't have more than a depth of 1 voxel for each fragment. However, since we projected orthogonally I don't see how that would reflect onto the 3D texture.
Finally, I am wondering on where to store the texture data. I know it's a bad idea to store data CPU side since you have to pass it all in to use it on the GPU, however the sourcecode I am kind of following chooses to store all its texture on the CPU side, such as those for a light map. My assumption is that data that will only be used on the GPU should be stored there and data used on both should be stored on the CPU side of things. So, from this I store my data on the CPU side. Is that correct?
My main sources have been: https://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-SparseVoxelization.pdf OpenGL Insights
https://github.com/otaku690/sparsevoxeloctree A SVO using a voxelizer. The issue is that the shader code is not in the github.
In my own implementation, the whole scene is positioned and scaled into one unit cube centered on world origin. The modelview-project matrices are straightforward then. And the viewport is simply the desired voxel resolution.
I use 2-pass approach to output those voxel fragments: the 1st pass calculate the number of output voxel fragments by accumulating a single variable using atomic counter. Then I use the info to allocate a linear buffer.
In the 2nd pass the rasterized voxel fragments are stored into the allocated linear buffer, using atomic counter to avoid write conflict.
I want to track the mouse coordinates in my OpenGL scene on the ground surface of the world which is modeled as a height map. Currently there is no fancy stuff like hardware tessellation. Note that this question is not about object picking.
Currently I'm doing the following which is clearly dropping the performance because of a read-back operation:
Render the world (the ground surface)
Read back the depth value at the mouse coordinates
Render the rest of the scene
Swap buffers and render the next frame
The read back is between the two render steps because I want the depth value of the ground surface without any objects in front of it. It is done using the following command:
GLfloat depth;
glReadPixels(x, y, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &depth);
My application limits the frame rate to 60 frames per second. When rendering the scene without the read back operation, I experience a CPU usage of less than 5%, but when doing the read back, it increases to about 75% although I'm not doing much to render the scene or update any game model or such things.
A temporary solution is to cache the depth value of the pixel under the mouse and update it only every 5th or 10th frame which causes the CPU usage going back down below 10%. But clearly can't be the best solution to the problem.
How can I implement picking (not object picking since I want the (floating point) coordinates on the surface) efficiently?
I already thought of reading back the depth value of the front buffer instead of the back buffer, but when googling on how to do so, I only find people complaining about glRead* methods to be best avoided at all. But how can I read something (do picking) without reading something (using glRead*)?
I'm confused. How do other people implement picking?
A totally different approach would be implementing the world surface picking in software. It should be no big deal to reconstruct a 3D ray from the camera "into the depth", representing the points in space which are rendered at the target pixel. Then I could implement an intersection algorithm to find the front-most point on the surface.
You typically implement it on the CPU! Find your picking ray in heightmap coordinates and do a simple line-trace across the heightmap. This is very similar to line-drawing. In each cell you intersect, test against the triangles you used to triangulate it.
It is important to avoid reading from the GPU until it's done. Since you normally schedule drawing commands several frames ahead (GL does this automatically), this means that you will also only get the results then - or stall the CPU until the GPU caught up. But don't do that for simple things like this!
I have an application where I need take the average intensity of an image for around 1 million images. It "feels" like a job for a GPU fragment shader, but fragment shaders are for per-pixel local computations, while image averaging is a global operation.
One approach I considered is loading the image into a texture, applying a 2x2 box-blur, load the result back into a N/2 x N/2 texture and repeating until the output is 1x1. However, this would take log n applications of the shader.
Is there a way to do it in one pass? Or should I just break down and use CUDA/OpenCL?
The summation operation is a specific case of the "reduction," a standard operation in CUDA and OpenCL libraries. A nice writeup on it is available on the cuda demos page. In CUDA, Thrust and CUDPP are just two examples of libraries that provide reduction. I'm less familiar with OpenCL, but CLPP seems to be a good library that provides reduction. Just copy your color buffer to an OpenGL pixel buffer object and use the appropriate OpenGL interoperability call to make that pixel buffer's memory accessible in CUDA/OpenCL.
If it must be done using the opengl API (as the original question required), the solution is to render to a texture, create a mipmap of the texture, and read in the 1x1 texture. You have to set the filtering right (bilinear is appropriate, I think), but it should get close to the right answer, modulo precision error.
My gut tells me to attempt your implementation in OpenCL. You can optimize for your image size and graphics hardware by breaking up the images into bespoke chunks of data that are then summed in parallel. Could be very fast indeed.
Fragment shaders are great for convolutions but that result is usually written to the gl_FragColor so it makes sense. Ultimately you will have to loop over every pixel in the texture and sum the result which is then read back in the main program. Generating image statistics perhaps not what the fragment shader was designed for and its not clear that a major performance gain is to be had since its not guaranteed a particular buffer is located in GPU memory.
It sounds like you may be applying this algorithm to a real-time motion detection scenario, or some other automated feature detection application. It may be faster to compute some statistics from a sample of pixels rather than the entire image and then build a machine learning classifier.
Best of luck to you in any case!
It doesn't need CUDA if you like to stick to GLSL. Like in the CUDA solution mentioned here, it can be done in a fragment shader staight forward. However, you need about log(resolution) draw calls.
Just set up a shader that takes 2x2 pixel samples from the original image, and output the average sum of those. The result is an image with half resolution in both axes. Repeat that until the image is 1x1 px.
Some considerations: Use GL_FLOAT luminance textures if avaliable, to get an more precise sum. Use glViewport to quarter the rendering area in each stage. The result then ends up in the top left pixel of your framebuffer.