recently I'm trying to implement an algorithm to generate vine in real time. I kinda know how to do it on cpu, but I want to use GPU to accomplish this. I was thinking of geometry shader, but it looks like geometry shader executes in primitive scale, meaning it will perform the exact same functionality on every primitive, which is not what I expect.
Here is conceptually how my vine growing algorithm works. pick any point on an object mesh as the root point, the vine growing algorithm generates a series of points(representing the vine) according to previous points produced. Positions of points are influenced by such factors as gravity, adhesion and distance to triangle faces. Every point must be in the same side as the normal of triangle face.
How can I do this on GPU? Thanks a lot.
If you want to do something like this, that doesn't map well to the regular rendering pipeline, in glsl; your best bet is to use compute shaders (if you don't need to implement this in glsl, you may also want to take a look at OpenCL or CUDA as possible alternatives, though note that CUDA in vendor-locked to NVIDIA GPUs) in this case you can use it to generate the vine geometry using whatever method you had planned; then render the vines as normal in a second pass.
Note that this is only a good idea if your vine generation algorithm maps well to the massively parallel nature of a GPU. If your algorithm is inherently serial, then using the CPU to generate the geometry will likely yield better results.
Related
I have been looking for methods to register (align) organized point clouds with normal information.
I could only find generic point cloud registration methods (for example in PCL).
I am using Microsoft Kinect to get my point clouds, but the problem is that they are quite big.
What I would like to know:
Is there are fast ways to register organized point clouds?
Are there down-sampling methods that are very fast (and may also
be using the fact that the point clouds are organized)?
I was also thinking about using OpenCV filters, since an organized
point cloud could be though of as an image with gray values (2D matrix with depth values). For example using the openCV resize method on the matrix, and some derivative type filters (because edges are important for me in the scene). Is that a good idea?
Also, down-sampling looks like a data-parallel problem, which could be a great candidate for GPU implementation. Do you know about any such implementation?
What I have done so far is the following.
- Several down-sampling methods (Random, Voxel-based, Uniform), but the problem with all of them is that they all took a lot of time (in PCL). Best was Voxel-based.
- Then did ICP, which ran pretty fast and accurate enough for me on the down-sampled point clouds.
So for me, currently, a good solution would be a fast way of down-sampling my point clouds. For example a GPU-based implementation for it.
Thinking of an organized point cloud as an image with greyvalues (simple 2D matrix) turns out to be a good idea.
Downsampling methods for 2D matrices implemented on GPU are available in, for example, OpenCV cuda.
Also, it is easy to implement your own fast downsampling methods on 2D matrices, depending on how important accuracy is. For example, just simply take every kth element. You can do, if needed, averaging at these elements to blur, or derivative type filters to sharpen (edge enhancement). You can come up with special picking methods, depending on information about the frames (e.g. if you know your objects tend to be in the center, then you can pick more points around the area).
All these three above will give faster results and probably "more-tuned" to your problem (especially #3). "More-tuned" implies less robust.
I need to generate an isosurface from chunks of voxels in an octree or array that supports both rounded and sharp geometry. I have searched for algorithms that seem to be capable of completing this task and found several, including Dual Contouring, Extended Marching Cubes and Dual Marching Cubes. The first two however, require Hermite data that seems like a massive memory drain. In addition, I can't find the actual algorithm for any of these, only equations from journals and vague descriptions. Any help to find an algorithm that will solve my problem would be very appreciated.
The ones you have mentioned are the most prominent ones.
However keep in mind that they have some limitations too:
Extended Marching Cubes (EMC) - preserves sharp features by taking into account the sample normals (and thus gradients of the normals), this method however is still not topologically consistent (homeomorphic), it doesn't allow adaptive refinement (simplification of the mesh) and has inter-cell dependency (due to the edge flipping process; which wont allow for an eventual GPU acceleration).
Dual Contouring (DC) - preserves sharp features and could be adaptively refined, but has inter-cell dependency, and also produces non-manifold meshes.
Dual Marching Cubes (DMC) - preserves sharp features, and produces manifold meshes (dealing with ambiguities), also allows for adaptive refinement, however still suffers from inter-cell dependency (due to it's dual nature) and also wont be as accurate, due to it's sliver elimination process, which rounds up the vertices (error might be negligible)
There are other possible combinations of these, as well as completely different techniques, I believe. Nevertheless I suggest you have a look at Cubical Marching Squares (CMS). I am currently trying to get my head around it as I was hoping to implement it. There are not too many implementations online for it. However it still works with Hermite Data (which was concerning you as far as I could tell).
I searched and I found some tutorials how to do terrain collision but they were using .raw files, I'm using .x. But, I think i can do same thing they did. They took x,y,z values of an object can checked it against every single triangle in the terrain. It makes sense but It look like it will be slow. It is just like picking checking against every single triangle is slow.
Is there faster way to do it and good?
UPDATE
My terrain is not flat, if it was i would use bounding boxes.
Last time I did this, I used the Bullet library, and it worked great. It has various collision shapes to choose from, optimised for different scenarios, including general triangle meshes and heightfields. You can use the library's collision routines without the physics.
One common way to significantly reduce the time it takes to detect collisions is to organize the space into an octree, which will allow you to very quickly determine whether or not a collision could occur in a particular node. Generally speaking, it's easier to accomplish these sorts of tasks with a game engine.
I have a device to acquire XRay images. Due to some technical constrains, the detector is made of heterogeneous pixel size and multiple tilted and partially overlapping tiles. The image is thus distorted. The detector geometry is known precisely.
I need a function converting these distorted images into a flat image with homogeneous pixel size. I have already done this by CPU, but I would like to give a try with OpenGL to use the GPU in a portable way.
I have no experience with OpenGL programming, and most of the information I could find on the web was useless for this use. How should I proceed ? How do I do this ?
Image size are 560x860 pixels and we have batches of 720 images to process. I'm on Ubuntu.
OpenGL is for rendering polygons. You might be able to do multiple passes and use shaders to get what you want but you are better off re-writing the algorithm in OpenCL. The bonus then would be you have something portable that will even use multi core CPUs if no graphics accelerator card is available.
Rather than OpenGL, this sounds like a CUDA, or more generally GPGPU problem.
If you have C or C++ code to do it already, CUDA should be little more than figuring out the types you want to use on the GPU and how the algorithm can be tiled.
If you want to do this with OpengGL, you'd normally do it by supplying the current data as a texture, and writing a fragment shader that processes that data, and set it up to render to a texture. Once the output texture is fully rendered, you can retrieve it back to the CPU and write it out as a file.
I'm afraid it's hard to do much more than a very general sketch of the overall flow without knowing more about what you're doing -- but if (as you said) you've already done this with CUDA, you apparently already have a pretty fair idea of most of the details.
At heart what you are asking here is "how can I use a GPU to solve this problem?"
Modern GPUs are essentially linear algebra engines, so your first step would be to define your problem as a matrix that transforms an input coordinate < x, y > to its output in homogenous space:
For example, you would represent a transformation of scaling x by ½, scaling y by 1.2, and translating up and left by two units as:
and you can work out analogous transforms for rotation, shear, etc, as well.
Once you've got your transform represented as a matrix-vector multiplication, all you need to do is load your source data into a texture, specify your transform as the projection matrix, and render it to the result. The GPU performs the multiplication per pixel. (You can also write shaders, etc, that do more complicated math, factor in multiple vectors and matrices and what-not, but this is the basic idea.)
That said, once you have got your problem expressed as a linear transform, you can make it run a lot faster on the CPU too by leveraging eg SIMD or one of the many linear algebra libraries out there. Unless you need real-time performance or have a truly immense amount of data to process, using CUDA/GL/shaders etc may be more trouble than it's strictly worth, as there's a bit of clumsy machinery involved in initializing the libraries, setting up render targets, learning the details of graphics development, etc.
Simply converting your inner loop from ad-hoc math to a well-optimized linear algebra subroutine may give you enough of a performance boost on the CPU that you're done right there.
You might find this tutorial useful (it's a bit old, but note that it does contain some OpenGL 2.x GLSL after the Cg section). I don't believe there are any shortcuts to image processing in GLSL, if that's what you're looking for... you do need to understand a lot of the 3D rasterization aspect and historical baggage to use it effectively, although once you do have a framework for inputs and outputs set up you can forget about that and play around with your own algorithms in shader code relatively easily.
Having being doing this sort of thing for years (initially using Direct3D shaders, but more recently with CUDA) I have to say that I entirely agree with the posts here recommending CUDA/OpenCL. It makes life much simpler, and generally runs faster. I'd have to be pretty desperate to go back to a graphics API implementation of non-graphics algorithms now.
I was wondering if the quality of texture mipmaps would be better if I used my own algorithm for pre-generating them, instead of the built-in automatic one. I'd probably use a slow but pretty algorithm, like Lanczos resampling.
Does it make sense? Will I get any quality gain on modern graphics cards?
There are good reasons to generate your own mipmaps. However, the quality of the downsampling is not one of them.
Game and graphic programmers have experimented with all kinds of downsampling algorithms in the past. In the end it turned out that the very simple "average four pixels"-method gives the best results. Also more advanced methods are in theory mathematical more correct they tend to take a lot of sharpness out of the mipmaps. This gives a flat look (Try it!).
For some (to me not understandable) reason the simple average method seems to have the best tradeoff between antialiasing and keeping the mipmaps sharp.
However, you may want to calculate your mipmaps with gamma-correction. OpenGL does not do this on it's own. This can make a real visual difference, especially for darker textures.
Doing so is simple. Instead of averaging four values together like this:
float average (float a, float b, float c, float d)
{
return (a+b+c+d)/4
}
Do this:
float GammaCorrectedAverage (float a, float b, float c, float d)
{
// assume a gamma of 2.0 In this case we can just square
// the components.
return sqrt ((a*a+b*b+c*c+d*d)/4)
}
This code assumes your color components are normalized to be in the range of 0 to 1.
What is motivating you to try? Are the mipmaps you have currently being poorly generated? (i.e. have you looked?) Bear in mind your results will often still be (tri)linearly interpolated anyway, so between that an motion there are often steeply diminishing returns to improved resampling.
It depends on the kind of assets you display. Lanczos filter gets closer to ideal low-pass filter and the results are noticeable if you compare the mip maps side by side. Most people will mistake aliasing for sharpness - again it depends whether your assets tend to contain high frequencies - I've definitely seen cases where box filter was not a good option. But since the mip map is then linearly interpolated anyway the gain might not be that noticeable. There is another thing to mention - most people use box filter and pass the output as an input into the next stage - in this way you lose both precision and visual energy (although gamma will help this one). If you can come up with code that uses arbitrary filter (mind you that most of them are separable into two passes) you would typically scale the filter kernel itself and produce mip map levels from the base texture, which is a good thing.
As an addition to this question, I have found that some completely different mipmapping (rather than those simply trying to achieve best down-scaling quality, like Lanczos filtering) algorithms have good effects on certain textures.
For instance, on some textures that are supposed to represent high-frequency information, I have tried using an algorithm that simply takes one random pixel of the four that are being considered for each iteration. The results depend much on the texture and what it is supposed to convey, but I have found that it gives great effect on some; not least for ground textures.
Another one I've tried is taking the most deviating of the four pixels to preserve contrasts. It has even fewer uses, but they do exist.
As such, I've implemented the option to choose mipmapping algorithm per texture.
EDIT: I thought I might provide some examples of the differences in practice. Here's a piece of grass texture on the ground, the leftmost picture being with standard average mipmapping, and the rightmost being with randomized mipmapping:
I hope the viewer can appreciate how much "apparent detail" is lost in the averaged mipmap, and how much flatter it looks for this kind of texture.
Also for reference, here are the same samples with 4× anisotropic filtering turned on (the above being tri-linear):
Anisotropic filtering makes the difference less pronounced, but it's still there.