OpenGL height-map painting using CUDA VBO

OpenGL height-map painting using CUDA VBO - opengl

I've asked several questions regarding VBO previously here and from the comments i had received i decided that a new approach must be taken.
To put it simply - I'm trying to draw the Mandelbrot set which is defined on a large FLOAT array, around 512X512 Points. the purpose of my program is to let the user control the zooming and world's orientation (it's a 3d model).
so far I've painted the entire thing using GL_TRIANGLE_STRIP which turned to be a bad choice because of its slow painting process. also because implementing my painting style (order of calling the glVertex) became impossible for coding for VBOs.
so I've got several questions.
even after this description i'm not sure either the VBO is the best choice because it's up the user to control the calculations.for each calculation that he causes by the program, i have to recompute the mandelbrot set(~60ms),and recopy the points to the buffer : a process which takes some time(?ms).
the program allows the user also to move in the world so no calculations are done here therefore VBO is an excellent choice here.
1.what's the best way to paint height map(when each cell in the array holds only the height)
2.how can i apply it on VBO and transfer it to cuda (cudaRegisterBuffer or something like that)
3.is there a way to distinguish between the mode and decide when VBOs are needed(in a no calculations mode) and when they aren't(calculations mode).

You don't need to copy the CUDA data each frame if you bind the CUDA array/VBO to the DirectX/OpenGL VB (refer to the CUDA Programming Guide for details). One way to render data as a height-field is to use the Geometry Shader to emit the tris based on the height-field. Another way is to use the height field as a parallax-map (ref DirectX SDK). My personal fave would be to make your height-field an array of positions (X/Y/Z) and use CUDA to modify only the Y-Values, then use an index buffer to define the polygons that compose the surface. Note that you'll also need to update the vertex normals, and you may also want to use XYZ/UV if you want to texture the surface. If 512x512 is too big, use raster-ops (texture sampling) to populate a lower-resolution height-field of the region of interest. You can do this stage in CUDA or OpenGL/DirectX (I'd recommend doing it in CUDA where you can easily write your own sampling kernel to lookup pixels when down-sampling).

Related

How do we display pixel data calculated in an OpenCL kernel to the screen using OpenGL?

I am interested in writing a real-time ray tracing application in c++ and I heard that using OpenCL-OpenGL interoperability is a good way to do this (to make good use of the GPU), so I have started writing a c++ project using this interoperability and using GLFW for window management. I should mention that although I have some coding experience, I do not have so much in c++ and have not worked with OpenCL or OpenGL before attempting this project, so I would appreciate it if answers are given with this in mind (that is, beginner-friendly terminology is preferred).
So far I have been able to get OpenCL-OpenGL interoperability working with an example using a vertex buffer object. I have also demonstrated that I can create image data with an RGBA array (at least on the CPU), send this to an OpenGL texture with glTexImage2D() and display it using glBlitFramebuffer().
My problem is that I don't know how to create an OpenCL kernel that is able to calculate pixel data such that it can be given as the data parameter in glTexImage2D(). I understand that to use the interoperability, we must first create OpenGL objects and then create OpenCL objects from these to write the data on as these objects share memory, so I am assuming I must first create an empty OpenGL array object then create an OpenCL array object from this to apply an appropriate kernel to which would write the pixel data before using the OpenGL array object as the data parameter in glTexImage2D(), but I am not sure what kind of object to use and have not seen any examples demonstrating this. A simple example showing how OpenCL can create pixel data for an OpenGL texture image (assuming a valid OpenCL-OpenGL context) would be much appreciated. Please do not leave any line out as I might not be able to fill in the blanks!
It's also very possible that the method I described above for implementing a ray tracer is not possible or at least not recommended, so if this is the case please outline an advised alternate method for sending OpenCL kernel calculated pixel data to OpenGL and subsequently drawing this to the screen. The answer to this similar question does not go into enough detail for me and the CL/GL interop link is not working. The answer mentions that this can be achieved using a renderbuffer rather than a texture, but it says at the bottom of the Khronos OpenGL wiki for Renderbuffer Objects that the only way to send pixel data to them is via pixel transfer operations but I can not find any straightforward explanation for how to initialize data this way.
Note that I am using OpenCL c (no c++ bindings).

From your second para you are creating an OpenCL context with a platform specific combination of GLX_DISPLAY / WGL_HDC and GL_CONTEXT properties to interoperate with OpenGL, and you can create a vertex buffer object that can be read/written as necessary by both OpenGL and OpenCL.
That's most of the work. In OpenGL you can copy any VBO into a texture with
glBindBuffer(GL_PIXEL_UNPACK_BUFER, myVBO);
glTexSubImage2D(GL_TEXTURE_2D, level, x, y, width, height, format, size, NULL);
with the NULL at the end meaning to copy from GPU memory (the unpack buffer) rather than CPU memory.
As with copying from regular CPU memory, you might also need to change the pixel alignment if it isn't 32 bit.

Applying a 2D heatmap to a 3D view

I currently have implemented an OpenGL 3.3 3D environment renderer rendering a (static) block of terrain, and I've been tasked with adding an overlay of statistical data to it; setting specific pixel colours on the terrain based on data values at each point.
The data in question is effectively supplied in the form of a black box in my C++ code base; I can input an X,Y pair of doubles (in worldspace), and it'll output a data value for that location (the terrain does have a third dimension, but the data is not concerned about that). The data in question is time-varying; on changing the time co-ordinate, the scene is expected to update with the data corresponding to the new co-ordinate.
I have a first implementation; the obvious one, where on creating each vertex the appropriate data value for that location is looked up in the black box and encoded in a dynamic buffer accompanying it, with the buffer updated as the time co-ordinate changes. This works perfectly in itself; it's fast to update, and the data is rendered as expected.
However, it's only got data points per-vertex, with simple interpolation across the polygon, and the question's been raised as to whether it's possible to instead render the data per-pixel.
I'm struggling with this. I can't realistically implement the black box behaviour directly in the shaders; it's a large, complex function that I don't fully understand myself (hence representing it here as a black box!), and it requires referencing multiple data sources. There was a version early on - before I looked into the project - that rendered the entire scene in our (separate, non-OpenGL, 2D), top-down environment renderer at an extremely high resolution and applied that as a texture to the mesh - but that's both cripplingly slow and still not true per-pixel data, you can still zoom to a point where the resolution breaks down.
I'm not currently using deferred rendering, but I'm wondering if I can use similar principles to that. One thing I'm considering currently is whether - during the render process - there's a way I can store worldspace X and Y data per-pixel in a buffer (stencil? G-? Arbitrary render target?), and then - back in the C++ environment - generate an overlay texture per frame based on those accumulated X and Y values - but I'm somewhat put off by the notion that that'd require double-precision, and lots of what I've seen suggests steering clear of any double calculations in GLSL; again, I'm worried about speed (although is a simple passthrough and interpolation of double-precision data less impactful?)... plus I'm not entirely sure that what I'm suggesting is even possible!
I may be overcomplicating this somewhat, though, there may be far simpler solutions that aren't in my frame of reference yet, so I'm curious to hear if there's any suggestions for better solutions, or if it's unrealistic.
(While I'm currently using 3.3, a solution requiring 4+ is not off the table)

OpenGL constructing and using data on the GPU

I am not a graphics programmer, I use C++ and C mainly, and every time I try to go into OpenGL, every book, and every resource starts like this:
GLfloat Vertices[] = {
some, numbers, here,
some, more, numbers,
numbers, numbers, numbers
};
Or they may even be vec4.
But then you do something like this:
for(int i = 0; i < 10000; i++)
for(int j = 0; j < 10000; j++)
make_vertex();
And you get a problem. That loop is going to take a significant amount of time to finish- and if the make_vertex() function is anything like a saxpy or something of the sort, it is not just a problem... it is a big problem. For example, let us assume I wish to create fractal terrain. For any modern graphic card this would be trivial.
I understand the paradigm goes like this: Write the vertices manually -> Send them over to the GPU -> GPU does vertex processing, geometry, rasterization all the good stuff. I am sure it all makes sense. But why do I have to do the entire 'Send it over' step? Is there no way to skip that entire intermediary step, and just create vertices on the GPU, and draw them, without the obvious bottleneck?
I would very much appreciate at least a point in the right direction.
I also wonder if there is a possible solution without delving into compute shaders or CUDA? Does openGL or GLSL not provide a suitable random function which can be executed in parallel?

I think what you're asking for could work by generating height maps with a compute shader, and mapping that onto a grid with fixed spacing which can be generated trivially. That's a possible solution off the top of my head. You can use GL Compute shaders, OpenCL, or CUDA. Details can be generated with geometry and tessellation shaders.
As for preventing the camera from clipping, you'd probably have to use transform feedback and do a check per frame to see if the direction you're moving in will intersect the geometry.

Your entire question seems to be built on a huge misconception, that vertices are the only things which need to be "crunched" by the GPU.
First, you should understand that GPUs are far more superior than CPUs when it comes to parallelism (heck, GPUs sacrifice conditional control jumping for the sake of parallelism). Second, shaders and these buffers you make are all stored on the GPU after being uploaded by the CPU. The reason you don't just create all vertices on the GPU? It's the same reason for why you load an image from the hard drive instead of creating a raw 2D array and start filling it up with your pixel data inline. Even then, your image would be stored in the executable program file, which is stored on the hard disk and only loaded to memory when you run it. In an actual application, you'll want to load your graphics off assets stored somewhere (usually the hard drive). Why not let the GPU load the assets from the hard drive by itself? The GPU isn't connected to a hardware's storage directly, but barely to the system's main memory via some BUS. That's because to connect to any storage directly, the GPU will have to deal with the file system which is managed by the OS. That's one of the things the CPU would be faster at doing since we're dealing with serialized data.
Now what shaders deal with is this data you upload to the GPU (vertices, texture coordinates, textures..etc). In ancient OpenGL, no one had to write any shaders. Graphics drivers came with a builtin pipeline which handles regular rendering requests for you. You'd provide it with 4 vertices, 4 texture coordinates and a texture among other things (transformation matrices..etc), and it'd draw your graphics for you on the screen. You could go a bit farther and add some lights to your scene and maybe customize a few things about it, but things were still pretty tight. New OpenGL specifications gave more freedom to the developer by allowing them to rewrite parts of the pipeline with shaders. The developer becomes responsible for transforming vertices into place and doing all sort of other calculations related to lighting etc.
I would very much appreciate at least a point in the right direction.
I am guessing it has something to do with uniforms, but really, with
me skipping pages, I really cannot understand how a shader program
runs or what the lifetime of the variables is.
uniforms are variables you can send to the shaders from the CPU every frame before you use it to render graphics. When you use the saturation slider in Photoshop or Gimp, it (probably) sends the saturation factor value to the shader as a uniform of type float. uniforms are what you use to communicate little settings like these to your shaders from your application.
To use a shader program, you first have to set it up. A shader program consists of at least 2 types of shaders linked together, a fragment shader and a vertex shader. You use some OpenGL functions to upload your shader sources to the GPU, issue an order of compilation followed by linking, and it'll give you the program's ID. To use this program, you simply glUseProgram(programId) and everything following this call will use it for drawing. The vertex shader is the code that runs on the vertices you send to position them on the screen correctly. This is where you can do transformations on your geometry like scaling, rotation etc. A fragment shader runs at some stage afterwards using interpolated (transitioned) values outputted from the vertex shader to define the color and the depth of every unit fragment on what you're drawing. This is where you can do post-processing effects on your pixels.
Anyway, I hope I've helped making a few things clearer to you, but I can only tell you that there are no shortcuts. OpenGL has quite a steep learning curve, but it all connects and things start to make sense after a while. If you're getting so bored of books and such, then consider maybe taking code snippets of every lesson, compile them, and start messing around with them while trying to rationalize as you go. You'll have to resort to written documents eventually, but hopefully then things will fit easier into your head when you have some experience with the implementation components. Good luck.
Edit:
If you're trying to generate vertices on the fly using some algorithm, then try looking into Geometry Shaders. They may give you what you want.

You probably want to use CUDA for the things you are used to do in C or C++, and let OpenGL access the rasterizer and other graphics stuff.
OpenGL an CUDA interact somehow nicely. A good entry point to customize the contents of a buffer object is here: http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__OPENGL.html#group__CUDART__OPENGL_1g0fd33bea77ca7b1e69d1619caf44214b , with cudaGraphicsGLRegisterBuffer method.
You may also want to have a look at the nbody sample from NVIDIA GPU SDK samples the come with current CUDA installs.

best way to wrap opengl models

In short: What is the "preferred" way to wrap OpenGL's buffers, shaders and/or matrices required for a more high level "model" object?
I am trying to write this tiny graphics engine in C++ built on core OpenGL 3.3 and I would like to implement an as clean as possible solution to wrapping a higher level "model" object, which would contain its vertex buffer, global position/rotation, textures (and also a shader maybe?) and potentially other information.
I have looked into this open source engine, called GamePlay3D and don't quite agree with many aspects of its solution to this problem. Is there any good resource that discusses this topic for modern OpenGL? Or is there some simple and clean way to do this?

That depends a lot on what you want to be able to do with your engine. Also note that these concepts are the same with DirectX (or any other graphic API), so don't focus too much your search on OpenGL. Here are a few points that are very common in a 3D engine (names can differ):
Mesh:
A mesh contains submeshes, each submesh contains a vertex buffer and an index buffer. The idea being that each submesh will use a different material (for example, in the mesh of a character, there could be a submesh for the body and one for the clothes.)
Instance:
An instance (or mesh instance) references a mesh, a list of materials (one for each submesh in the mesh), and contains the "per instance" shader uniforms (world matrix etc.), usually grouped in a uniform buffer.
Material: (This part changes a lot depending on the complexity of the engine). A basic version would contain some textures, some render states (blend state, depth state), a shader program, and some shader uniforms that are common to all instances (for example a color, but that could also be in the instance depending on what you want to do.)
More complex versions usually separates the materials in passes (or sometimes techniques that contain passes) that contain everything that's in the previous paragraph. You can check Ogre3D documentation for more info about that and to take a look at one possible implementation. There's also a very good article called Designing a Data-Driven Renderer in GPU PRO 3 that describes an even more flexible system based on the same idea (but also more complex).
Scene: (I call it a scene here, but it could really be called anything). It provides the shader parameters and textures from the environment (lighting values, environment maps, this kind of things).
And I thinks that's it for the basics. With that in mind, you should be able to find your way around the code of any open-source 3D engine if you want the implementation details.

This is in addition to Jerem's excellent answer.
At a low level, there is no such thing as a "model", there is only buffer data and the code used to process it. At a high level, the concept of a "model" will differ from application to application. A chess game would have a static mesh for each chess piece, with shared textures and materials, but a first-person shooter could have complicated models with multiple parts, swappable skins, hit boxes, rigging, animations, et cetera.
Case study: chess
For chess, there are six pieces and two colors. Let's over-engineer the graphics engine to show how it could be done if you needed to draw, say, thousands of simultaneous chess games in the same screen, instead of just one game. Here is how you might do it.
Store all models in one big buffer. This buffer has all of the vertex and index data for all six models clumped together. This means that you never have to switch buffers / VAOs when you're drawing pieces. Also, this buffer never changes, except when the user goes into settings and chooses a different style for the chess pieces.
Create another buffer containing the current location of each piece in the game, the color of each piece, and a reference to the model for that piece. This buffer is updated every frame.
Load the necessary textures. Maybe the normals would be in one texture, and the diffuse map would be an array texture with one layer for white and another for black. The textures are designed so you don't have to change them while you're drawing chess pieces.
To draw all the pieces, you just have to update one buffer, and then call glMultiDrawElementsIndirect()... once per frame, and it draws all of the chess pieces. If that's not available, you can fall back to glDrawElements() or something else.
Analysis
You can see how this kind of design won't work for everything.
What if you have to stream new models into memory, and remove old ones?
What if the models have different size textures?
What if the models are more complex, with animations or forward kinematics?
What about translucent models?
What about hit boxes and physics data?
What about different LODs?
The problem here is that your solution, and even the very concept of what a "model" is, will be very different depending on what your needs are.

When using Direct 3D, what should be processed in code and what should be processed in HLSL?

I am very new to 3D programming, namely with DirectX. I have been trying to follow tutorials on how to do basic things, and I have been looking at the samples provided by Microsoft. One of the big questions I have had is how to tell what calculations should be done in the actual game code and what calculations should be done in HLSL. I have not been able to understand what should be done where, because it looks like, to me, you could have almost all code pertaining to calculations in your shader file, or you could have it all in the executable code and only send the bear minimum to the pixel and vertex shaders. How can one tell what code should go where? If you need an example, I'll try to find one.

"Code" - CPU code
"HLSL" - GPU code
Basically, you want everything that is pure graphics to happen on the GPU. That is, when the information about what you want to render has been sent to the GPU, it should take over and use that information to generate the final image.
You want to the CPU to say to the GPU "this is what I want to render, and here is everything you need to make it happen" and then make sure to tell the GPU "this is how you render it".
Some examples (not a complete or final list in anyway):
CPU:
Anything dealing with window opening/closing/resizing
User input from mouse, keyboard
Reading and setting configuration
Generating and updating view matrices
Application logic
Setting up and initializing rendering (textures, buffers etc)
Generating vertex data (position, texture coordinates etc)
Creating graphic entities (triangles, textures, colors etc)
Handling animation (timestepping, swapping buffers)
Sending updated data to the GPU for each frame
GPU:
Use the view matrices to put things on the right place on the screen
Interpolate from vertex data to fragment data
Shading (usually, this is the most complicated part)
Calculate and write final pixel color

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js