Accumulating renderings without accumulation buffer in openGL 4.3 - c++

I am rendering a scene using a very simple path tracer using GLSL. I create a map in screen space by using some of the information from the path tracing. In the first frame the map looks very noisy (because has mostly 0s), but in the next frame I change the sample and I would like the new map to be averaged with the previous one.
I know there is accumulation buffer but it has deprecated and I think that it would be a costly solution for such simple feature.
In my shader I currently render the map and a final image and display this second one. In the next frame I want to use compute a new map and accumulate it with the previous one in order to render this new frame. Ideally I would like to accumulate 5 subsequent frames.
I hope I made the question clear enough. I don't want to compute the samples and accumulation in the shader because I want the program to still be interactive, even if noisy.

Related

Applying a 2D heatmap to a 3D view

I currently have implemented an OpenGL 3.3 3D environment renderer rendering a (static) block of terrain, and I've been tasked with adding an overlay of statistical data to it; setting specific pixel colours on the terrain based on data values at each point.
The data in question is effectively supplied in the form of a black box in my C++ code base; I can input an X,Y pair of doubles (in worldspace), and it'll output a data value for that location (the terrain does have a third dimension, but the data is not concerned about that). The data in question is time-varying; on changing the time co-ordinate, the scene is expected to update with the data corresponding to the new co-ordinate.
I have a first implementation; the obvious one, where on creating each vertex the appropriate data value for that location is looked up in the black box and encoded in a dynamic buffer accompanying it, with the buffer updated as the time co-ordinate changes. This works perfectly in itself; it's fast to update, and the data is rendered as expected.
However, it's only got data points per-vertex, with simple interpolation across the polygon, and the question's been raised as to whether it's possible to instead render the data per-pixel.
I'm struggling with this. I can't realistically implement the black box behaviour directly in the shaders; it's a large, complex function that I don't fully understand myself (hence representing it here as a black box!), and it requires referencing multiple data sources. There was a version early on - before I looked into the project - that rendered the entire scene in our (separate, non-OpenGL, 2D), top-down environment renderer at an extremely high resolution and applied that as a texture to the mesh - but that's both cripplingly slow and still not true per-pixel data, you can still zoom to a point where the resolution breaks down.
I'm not currently using deferred rendering, but I'm wondering if I can use similar principles to that. One thing I'm considering currently is whether - during the render process - there's a way I can store worldspace X and Y data per-pixel in a buffer (stencil? G-? Arbitrary render target?), and then - back in the C++ environment - generate an overlay texture per frame based on those accumulated X and Y values - but I'm somewhat put off by the notion that that'd require double-precision, and lots of what I've seen suggests steering clear of any double calculations in GLSL; again, I'm worried about speed (although is a simple passthrough and interpolation of double-precision data less impactful?)... plus I'm not entirely sure that what I'm suggesting is even possible!
I may be overcomplicating this somewhat, though, there may be far simpler solutions that aren't in my frame of reference yet, so I'm curious to hear if there's any suggestions for better solutions, or if it's unrealistic.
(While I'm currently using 3.3, a solution requiring 4+ is not off the table)

Data processing and video generation with OpenGL/CL

Goal: compensate and visualize a stream of 14-bit data (2D video).
Existing solution: Each sample needs to be compensated for a gain and offset, so it requires one multiplication and one addition. Then I assign a colour to the sample by a look-up table and output a stream of "colours" directly to the display. Everything is done on CPU.
Requirements: I need to be able to dynamically set a look-up table (palette).
It seems obvious to use GPU for such an operation, but I couldn't find any info about how to move from data domain to picture domain with OpenGL. I've thought about using OpenCL for data compensation and image generation and then moving to OpenGL for displaying (or in general: for manipulating picture).
Can you recommend me a good approach for this? Can this all be efficiently achieved just with the OpenGL? How?
Yes, it can be done using only OpenGL.
I would suggest a workflow like the following:
For each frame:
Upload frame from stream to texture memory
Draw a full-screen quad, with texture coordinates from 0,0 to 1,1
In a fragment shader apply for each pixel the appropriate transformation. The lookup table can also be stored in a texture, so you only have to perform a lookup on the appropriate location.
In general: This question is at the moment a little bit too broad to be answered in more detail. For example a stream of 14-bit data could be a lot of things. I assumed for this answer you meant a (2D) video stream.

Infinite cube world engine (like Minecraft) optimization suggestions?

Voxel engine (like Minecraft) optimization suggestions?
As a fun project (and to get my Minecraft-adict son excited for programming) I am building a 3D Minecraft-like voxel engine using C# .NET4.5.1, OpenGL and GLSL 4.x.
Right now my world is built using chunks. Chunks are stored in a dictionary, where I can select them based on a 64bit X | Z<<32 key. This allows to create an 'infinite' world that can cache-in and cache-out chunks.
Every chunk consists of an array of 16x16x16 block segments. Starting from level 0, bedrock, it can go as high as you want (unlike minecraft where the limit is 256, I think).
Chunks are queued for generation on a separate thread when they come in view and need to be rendered. This means that chunks might not show right away. In practice you will not notice this. NOTE: I am not waiting for them to be generated, they will just not be visible immediately.
When a chunk needs to be rendered for the first time a VBO (glGenBuffer, GL_STREAM_DRAW, etc.) for that chunk is generated containing the possibly visible/outside faces (neighboring chunks are checked as well). [This means that a chunk potentially needs to be re-tesselated when a neighbor has been modified]. When tesselating first the opaque faces are tesselated for every segment and then the transparent ones. Every segment knows where it starts within that vertex array and how many vertices it has, both for opaque faces and transparent faces.
Textures are taken from an array texture.
When rendering;
I first take the bounding box of the frustum and map that onto the chunk grid. Using that knowledge I pick every chunk that is within the frustum and within a certain distance of the camera.
Now I do a distance sort on the chunks.
After that I determine the ranges (index, length) of the chunks-segments that are actually visible. NOW I know exactly what segments (and what vertex ranges) are 'at least partially' in view. The only excess segments that I have are the ones that are hidden behind mountains or 'sometimes' deep underground.
Then I start rendering ... first I render the opaque faces [culling and depth test enabled, alpha test and blend disabled] front to back using the known vertex ranges. Then I render the transparent faces back to front [blend enabled]
Now... does anyone know a way of improving this and still allow dynamic generation of an infinite world? I am currently reaching ~80fps#1920x1080, ~120fps#1024x768 (screenshots: http://i.stack.imgur.com/t4k30.jpg, http://i.stack.imgur.com/prV8X.jpg) on an average 2.2Ghz i7 laptop with a ATI HD8600M gfx card. I think it must be possible to increase the number of frames. And I think I have to, as I want to add entity AI, sound and do bump and specular mapping. Could using Occlusion Queries help me out? ... which I can't really imagine based on the nature of the segments. I already minimized the creation of objects, so there is no 'new Object' all over the place. Also as the performance doesn't really change when using Debug or Release mode, I don't think it's the code but more the approach to the problem.
edit: I have been thinking of using GL_SAMPLE_ALPHA_TO_COVERAGE but it doesn't seem to be working?
gl.Enable(GL.DEPTH_TEST);
gl.Enable(GL.BLEND); // gl.Disable(GL.BLEND);
gl.Enable(GL.MULTI_SAMPLE);
gl.Enable(GL.SAMPLE_ALPHA_TO_COVERAGE);
To render a lot of similar objects, I strongly suggest you take a look into instanced draw : glDrawArraysInstanced and/or glDrawElementsInstanced.
It made a huge difference for me. I'm talking from 2 fps to over 60 fps to render 100000 similar icosahedrons.
You can parametrize your cubes by using Attribs ( glVertexAttribDivisor and friends ) to make them differents. Hope this helps.
It's on ~200fps currently, should be OK. The 3 main things that I've done are:
1) generation of both chunks on a separate thread.
2) tessellation the chunks on a separate thread.
3) using a Deferred Rendering Pipeline.
Don't really think the last one contributed much to the overall performance but had to start using it because of some of the shaders. Now the CPU is sort of falling asleep # ~11%.
This question is pretty old, but I'm working on a similar project. I approached it almost exactly the same way as you, however I added in one additional optimization that helped out a lot.
For each chunk, I determine which sides are completely opaque. I then use that information to do a flood fill through the chunks to cull out the ones that are underground. Note, I'm not checking individual blocks when I do the flood fill, only a precomputed bitmask for each chunk.
When I'm computing the bitmask, I also check to see if the chunk is entirely empty, since empty chunks can obviously be ignored.

When using Direct 3D, what should be processed in code and what should be processed in HLSL?

I am very new to 3D programming, namely with DirectX. I have been trying to follow tutorials on how to do basic things, and I have been looking at the samples provided by Microsoft. One of the big questions I have had is how to tell what calculations should be done in the actual game code and what calculations should be done in HLSL. I have not been able to understand what should be done where, because it looks like, to me, you could have almost all code pertaining to calculations in your shader file, or you could have it all in the executable code and only send the bear minimum to the pixel and vertex shaders. How can one tell what code should go where? If you need an example, I'll try to find one.
"Code" - CPU code
"HLSL" - GPU code
Basically, you want everything that is pure graphics to happen on the GPU. That is, when the information about what you want to render has been sent to the GPU, it should take over and use that information to generate the final image.
You want to the CPU to say to the GPU "this is what I want to render, and here is everything you need to make it happen" and then make sure to tell the GPU "this is how you render it".
Some examples (not a complete or final list in anyway):
CPU:
Anything dealing with window opening/closing/resizing
User input from mouse, keyboard
Reading and setting configuration
Generating and updating view matrices
Application logic
Setting up and initializing rendering (textures, buffers etc)
Generating vertex data (position, texture coordinates etc)
Creating graphic entities (triangles, textures, colors etc)
Handling animation (timestepping, swapping buffers)
Sending updated data to the GPU for each frame
GPU:
Use the view matrices to put things on the right place on the screen
Interpolate from vertex data to fragment data
Shading (usually, this is the most complicated part)
Calculate and write final pixel color

OpenGL height-map painting using CUDA VBO

I've asked several questions regarding VBO previously here and from the comments i had received i decided that a new approach must be taken.
To put it simply - I'm trying to draw the Mandelbrot set which is defined on a large FLOAT array, around 512X512 Points. the purpose of my program is to let the user control the zooming and world's orientation (it's a 3d model).
so far I've painted the entire thing using GL_TRIANGLE_STRIP which turned to be a bad choice because of its slow painting process. also because implementing my painting style (order of calling the glVertex) became impossible for coding for VBOs.
so I've got several questions.
even after this description i'm not sure either the VBO is the best choice because it's up the user to control the calculations.for each calculation that he causes by the program, i have to recompute the mandelbrot set(~60ms),and recopy the points to the buffer : a process which takes some time(?ms).
the program allows the user also to move in the world so no calculations are done here therefore VBO is an excellent choice here.
1.what's the best way to paint height map(when each cell in the array holds only the height)
2.how can i apply it on VBO and transfer it to cuda (cudaRegisterBuffer or something like that)
3.is there a way to distinguish between the mode and decide when VBOs are needed(in a no calculations mode) and when they aren't(calculations mode).
You don't need to copy the CUDA data each frame if you bind the CUDA array/VBO to the DirectX/OpenGL VB (refer to the CUDA Programming Guide for details). One way to render data as a height-field is to use the Geometry Shader to emit the tris based on the height-field. Another way is to use the height field as a parallax-map (ref DirectX SDK). My personal fave would be to make your height-field an array of positions (X/Y/Z) and use CUDA to modify only the Y-Values, then use an index buffer to define the polygons that compose the surface. Note that you'll also need to update the vertex normals, and you may also want to use XYZ/UV if you want to texture the surface. If 512x512 is too big, use raster-ops (texture sampling) to populate a lower-resolution height-field of the region of interest. You can do this stage in CUDA or OpenGL/DirectX (I'd recommend doing it in CUDA where you can easily write your own sampling kernel to lookup pixels when down-sampling).