In computer graphics it's a common technique to apply jittering to sampling positions in order to avoid visible sampling patterns.
What's the proper way to apply jittering to sampl-positions in a fragment shader? One way I could think of would be to feed a noise-texture into the shader, and then depending on the texlvalue of this noise texture alter the sampling-positions of whatever one wants to sample.
Is there a better way of implementing jittering?
The various hardware driver AA schemes usually are jitter-based already -- each vendor has their own favored patterns. And unfortunately the user/player can often mess with the settings and turn things on and off.
This might make you think that there's no "correct" way -- and you'd be right! There are ways, some of which are correct for some tasks. Learn and use what you like.
For surface-texture jittering, you might try something like so:
In photoshop, make a 256x512 rgb image (or 256x511, for some versions of the NVIDIA DDS exporter), fill it ALL with random noise, save as DDS with "use existing mip maps." this will give you roughly-uniform noise regardless of how much the texture is scaled (at least through many useful size ranges).
in your shader, read the noise texture, then apply it to the UVs, and then read your other texture(s), e.g.
float4 noise = tex2D(noiseSampler,OriginalUV);
float2 newUV = OriginalUV + someSmallScale*(noise.xy-0.5);
float4 jitteredColor = tex2D(colorSampler,newUV);
Try other variations as you like. Be aware that strong texture jittering can cause a lot of texture cache misses, which may have a performance cost, and be careful around normal maps, since you are introducing high-frequency components into a signal that's already been filtered.
Related
I am not a graphics programmer, I use C++ and C mainly, and every time I try to go into OpenGL, every book, and every resource starts like this:
GLfloat Vertices[] = {
some, numbers, here,
some, more, numbers,
numbers, numbers, numbers
};
Or they may even be vec4.
But then you do something like this:
for(int i = 0; i < 10000; i++)
for(int j = 0; j < 10000; j++)
make_vertex();
And you get a problem. That loop is going to take a significant amount of time to finish- and if the make_vertex() function is anything like a saxpy or something of the sort, it is not just a problem... it is a big problem. For example, let us assume I wish to create fractal terrain. For any modern graphic card this would be trivial.
I understand the paradigm goes like this: Write the vertices manually -> Send them over to the GPU -> GPU does vertex processing, geometry, rasterization all the good stuff. I am sure it all makes sense. But why do I have to do the entire 'Send it over' step? Is there no way to skip that entire intermediary step, and just create vertices on the GPU, and draw them, without the obvious bottleneck?
I would very much appreciate at least a point in the right direction.
I also wonder if there is a possible solution without delving into compute shaders or CUDA? Does openGL or GLSL not provide a suitable random function which can be executed in parallel?
I think what you're asking for could work by generating height maps with a compute shader, and mapping that onto a grid with fixed spacing which can be generated trivially. That's a possible solution off the top of my head. You can use GL Compute shaders, OpenCL, or CUDA. Details can be generated with geometry and tessellation shaders.
As for preventing the camera from clipping, you'd probably have to use transform feedback and do a check per frame to see if the direction you're moving in will intersect the geometry.
Your entire question seems to be built on a huge misconception, that vertices are the only things which need to be "crunched" by the GPU.
First, you should understand that GPUs are far more superior than CPUs when it comes to parallelism (heck, GPUs sacrifice conditional control jumping for the sake of parallelism). Second, shaders and these buffers you make are all stored on the GPU after being uploaded by the CPU. The reason you don't just create all vertices on the GPU? It's the same reason for why you load an image from the hard drive instead of creating a raw 2D array and start filling it up with your pixel data inline. Even then, your image would be stored in the executable program file, which is stored on the hard disk and only loaded to memory when you run it. In an actual application, you'll want to load your graphics off assets stored somewhere (usually the hard drive). Why not let the GPU load the assets from the hard drive by itself? The GPU isn't connected to a hardware's storage directly, but barely to the system's main memory via some BUS. That's because to connect to any storage directly, the GPU will have to deal with the file system which is managed by the OS. That's one of the things the CPU would be faster at doing since we're dealing with serialized data.
Now what shaders deal with is this data you upload to the GPU (vertices, texture coordinates, textures..etc). In ancient OpenGL, no one had to write any shaders. Graphics drivers came with a builtin pipeline which handles regular rendering requests for you. You'd provide it with 4 vertices, 4 texture coordinates and a texture among other things (transformation matrices..etc), and it'd draw your graphics for you on the screen. You could go a bit farther and add some lights to your scene and maybe customize a few things about it, but things were still pretty tight. New OpenGL specifications gave more freedom to the developer by allowing them to rewrite parts of the pipeline with shaders. The developer becomes responsible for transforming vertices into place and doing all sort of other calculations related to lighting etc.
I would very much appreciate at least a point in the right direction.
I am guessing it has something to do with uniforms, but really, with
me skipping pages, I really cannot understand how a shader program
runs or what the lifetime of the variables is.
uniforms are variables you can send to the shaders from the CPU every frame before you use it to render graphics. When you use the saturation slider in Photoshop or Gimp, it (probably) sends the saturation factor value to the shader as a uniform of type float. uniforms are what you use to communicate little settings like these to your shaders from your application.
To use a shader program, you first have to set it up. A shader program consists of at least 2 types of shaders linked together, a fragment shader and a vertex shader. You use some OpenGL functions to upload your shader sources to the GPU, issue an order of compilation followed by linking, and it'll give you the program's ID. To use this program, you simply glUseProgram(programId) and everything following this call will use it for drawing. The vertex shader is the code that runs on the vertices you send to position them on the screen correctly. This is where you can do transformations on your geometry like scaling, rotation etc. A fragment shader runs at some stage afterwards using interpolated (transitioned) values outputted from the vertex shader to define the color and the depth of every unit fragment on what you're drawing. This is where you can do post-processing effects on your pixels.
Anyway, I hope I've helped making a few things clearer to you, but I can only tell you that there are no shortcuts. OpenGL has quite a steep learning curve, but it all connects and things start to make sense after a while. If you're getting so bored of books and such, then consider maybe taking code snippets of every lesson, compile them, and start messing around with them while trying to rationalize as you go. You'll have to resort to written documents eventually, but hopefully then things will fit easier into your head when you have some experience with the implementation components. Good luck.
Edit:
If you're trying to generate vertices on the fly using some algorithm, then try looking into Geometry Shaders. They may give you what you want.
You probably want to use CUDA for the things you are used to do in C or C++, and let OpenGL access the rasterizer and other graphics stuff.
OpenGL an CUDA interact somehow nicely. A good entry point to customize the contents of a buffer object is here: http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__OPENGL.html#group__CUDART__OPENGL_1g0fd33bea77ca7b1e69d1619caf44214b , with cudaGraphicsGLRegisterBuffer method.
You may also want to have a look at the nbody sample from NVIDIA GPU SDK samples the come with current CUDA installs.
I think at least some old graphics drivers used to crash if glClear wasn't used and that glClear is probably faster in a lot of cases but why? How are 3-d graphics drivers usually implemented such that these uses would have different results?
On a high level, it can be faster because the OpenGL implementation knows ahead of time that the whole buffer needs to be set to the same color/value. The more you know about what exactly needs to be done, the more you can take advantage of possible accelerations.
Let's say setting a whole buffer to the same value is more efficient than setting the same pixels to variable values. With a glClear(), you know already that all pixels will have the same value. If you draw a screen sized quad with a fragment shader that emits a constant color, the driver would either have to recognize that situation by analyzing the shaders, or the system would have to compare the values coming out of the shader, to know that all pixels have the same value.
The reason why setting everything to the same value can be more efficient has to do with framebuffer compression and related technologies. GPUs often don't actually write each pixel out to the framebuffer, but use various kinds of compression schemes to reduce the memory bandwidth needed for framebuffer writes. If you imagine almost any kind of compression, all pixels having the same value is very favorable.
To give you some ideas about the published vendor specific technologies, here are a few sources. You can probably find more with a search.
Article talking about new framebuffer compression method in relatively recent AMD cards: http://techreport.com/review/26997/amd-radeon-r9-285-graphics-card-reviewed/2.
NVIDIA patent on zero bandwidth clears: http://www.google.com/patents/US8330766.
Blurb on ARM web site about Mali framebuffer compression: http://www.arm.com/products/multimedia/mali-technologies/arm-frame-buffer-compression.php.
Why is it faster? Because it is a function that bypasses most calculations that other types of drawings have to go through.
Alpha function, blend function, logical operation, stenciling, texture mapping, and depth-buffering are ignored by glClear
Source
Why do some drivers crash without it? It's hard to say, but it should have something to do with the implementation details of OpenGL. The functions does what it's supposed to do, but might do more that you don't know about.
OpenGL might infer from this function call other tasks that it needs to perform.
I am emulating an old client and most of the textures are 128x128. As I know switching textures and render calls are expensive, could I get away with at runtime, creating a very large texture atlas of all the tiny textures.
Then from there, could I bind the large texture and render via shaders with the texture offset in the atlas? What kind of performance hit would this have?
My second question is involving sorting. The level files are broken up into little chunks of a BSP tree. They are very small and there are often thousands per level. The current way, which is definitely slower is that I render each group of textures in each BSP leaf that are in my frustum and PVS (Quake 3 style). What would be the idea fix for this?
Would I want to run through each region (back to front) I could see, and group all of the visible triangles by texture and then render all at once. For some reason, I always feel this may be slower. Does it make more sense to sort first and render all at once or skip sorting and render each region at a time?
Yes, it should be possible to stuff textures into a texture atlas. Careful consideration would need to be given to issues like whether the textures are being tiled, and what you want to happen with mip-maps and filtering.
I'd probably suggest holding off on doing that until you've got the geometry sorted out - reducing the number of draw calls to a sane number may well be all you need to do. The cost of changing a texture isn't necessarily that bad on modern hardware.
As for rendering the level, on modern hardware it's more usual to render front to back, rather than back to front, in order to take advantage of the z-buffer and early culling (i.e. if you have a wall right in front of the camera, and you draw that first with z-buffering enabled, the hardware is fairly good at rejecting stuff you then try to draw behind it).
One possible approach would be to reprocess the BSP into a rougher structure, such as a simple grid (or maybe a quad or oct tree). You don't even need to split polygons neatly, just have a bunch of sectors with a bounding-box around them, and frustum-cull then loosely sort (front to back) the boxes. You can keep a PVS with this approach too, but the usefulness of that might drop when your rendering chunks become larger and coarser.
However before doing any of this, I'd definitely recommend setting up some benchmark tests and recording performance information. You won't know for sure if you're doing the right thing unless you actually analyse the performance. If you can pinpoint exactly what the worst performant thing you are doing is, you just need to fix that.
I'm working with an environmental reflection in OpenGL+GLSL.
I want to reflect the environment around an object in the most accurate way possible.
I found basically two way to do this, one is called SphericalMapping and the other is CubeMapping.
They differ in the shader code but really don't understand what is the difference between them.
Obviously for the cubemapping shader I have 6 images printed on a cube that are needed for the fragment shader to look the right pixel, and for my Spheric mapping shader a single image which is distorted with a photo-retouch software or obtained by taking a photo of a specular reflective sphere.
The drawbacks of spherical mapping seems to be that the camera (and the person which holds it) is always showed in the image and the sampling is non-uniform. What is meant by this latest statement? What is meant by "black-hole" effect in spherical mapping?
I would like to find an interactive demonstration of the differences and drawbacks of these two approaches, it seems like cubemapping is the best, but don't know why.
What is the best of the two especially for a realtime simulation with head tracking in your opinion?
Spheremaps are usually for small, low quality stuff.
The drawbacks of spherical mapping seems to be that the camera (and the person which holds it) is always showed in the image
We're talking about computer graphics here; there is no real camera, or no real person. Try imagegoogling "spheremap", you won't see anybody in the pictures.
the sampling is non-uniform
This means that the center of the spheremap has many pixels for a relatively small area, while near the border, you have few pixels for a relatively large area.
Cubemaps are almost always better : you can generate them at runtime easily, it's faster to sample for the hardware, and even though you have 6 textures instead of 1, you can use a lower resolution and still get the same quality.
I am rewriting an opengl-based gis/mapping program. Among other things, the program allows you to load raster images of nautical charts, fix them to lon/lat coordinates and zoom and pan around on them.
The previous version of the program uses a custom tiling system, where in essence it manually creates mipmaps of the original image, in the form of 256x256-pixel tiles at various power-of-two zoom levels. A tile for zoom level n - 1 is constructed from four tiles from zoom level n, using a simple average-of-four-points algorithm. So, it turns off opengl mipmapping, and instead when it comes time to draw some part of the chart at some zoom level, it uses the tiles from the nearest-match zoom level (i.e., the tiles are in power-of-two zoom levels but the program allows arbitrary zoom levels) and then scales the tiles to match the actual zoom level. And of course it has to manage a cache of all these tiles at various levels.
It seemed to me that this tiling system was overly complex. It seemed like I should be able to let the graphics hardware do all of this mipmapping work for me. So in the new program, when I read in an image, I chop it into textures of 1024x1024 pixels each. Then I fix each texture to its lon/lat coordinates, and then I let opengl handle the rest as I zoom and pan around.
It works, but the problem is: My results are a bit blurrier than the original program, which matters for this application because you want to be able to read text on the charts as early as possible, zoom-wise. So it's seeming like the simple average-of-four-points algorithm the original program uses gives better results than opengl + my GPU, in terms of sharpness.
I know there are several glTexParameter settings to control some aspects of how mipmaps work. I've tried various combinations of GL_TEXTURE_MAX_LEVEL (anywhere from 0 to 10) with various settings for GL_TEXTURE_MIN_FILTER. When I set GL_TEXTURE_MAX_LEVEL to 0 (no mipmaps), I certainly get "sharp" results, but they are too sharp, in the sense that pixels just get dropped here and there, so the numbers are unreadable at intermediate zooms. When I set GL_TEXTURE_MAX_LEVEL to a higher value, the image looks quite good when you are zoomed far out (e.g., when the whole chart fits on the screen), but as you zoom in to intermediate zooms, you notice the blurriness especially when looking at text on the charts. (I.e., if it weren't for the text you might think "wow, opengl is doing a nice job of smoothly scaling my image." but with the text you think "why is this chart out of focus?")
My understanding is that basically you tell opengl to generate mipmaps, and then as you zoom in it picks the appropriate mipmaps to use, and there are some limited options for interpolating between the two closest mipmap levels, and either using the closest pixels or averaging the nearby pixels. However, as I say, none of these combinations seem to give quite as clear results, at the same zoom level on the chart (i.e., a zoom level where text is small but not minuscule, like the equivalent of "7 point" or "8 point" size), as the previous tile-based version.
My conclusion is that the mipmaps that opengl creates are simply blurrier than the ones the previous program created with the average-four-point algorithm, and no amount of choosing the right mipmap or LINEAR vs NEAREST is going to get the sharpness I need.
Specific questions:
(1) Does it seem right that opengl is in fact making blurrier mipmaps than the average-four-points algorithm from the original program?
(2) Is there something I might have overlooked in my use of glTexParameter that could give sharper results using the mipmaps opengl is making?
(3) Is there some way I can get opengl to make sharper mipmaps in the first place, such as by using a "cubic" filter or otherwise controlling the mipmap creation process? Or for that matter it seems like I could use the same average-four-points code to manually generate the mipmaps and hand them off to opengl. But I don't know how to do that...
(1) it seems unlikely; I'd expect it just to use a box filter, which is average four points in effect. Possibly it's just switching from one texture to a higher resolution one at a different moment — e.g. it "Chooses the mipmap that most closely matches the size of the pixel being textured", so a 256x256 map will be used to texture a 383x383 area, whereas the manual system it replaces may always have scaled down from 512x512 until the target size was 256x256 or less.
(2) not that I'm aware of in base GL, but if you were to switch to GLSL and the programmable pipeline then you could use the 'bias' parameter to texture2D if the problem is that the lower resolution map is being used when you don't want it to be. Similarly, the GL_EXT_texture_lod_bias extension can do the same in the fixed pipeline. It's an NVidia extension from a decade ago and is something all programmable cards could do, so it's reasonably likely you'll have it.
(EDIT: reading the extension more thoroughly, texture bias migrated into the core spec of OpenGL in version 1.4; clearly my man pages are very out of date. Checking the 1.4 spec, page 279, you can supply a GL_TEXTURE_LOD_BIAS)
(3) yes — if you disable GL_GENERATE_MIPMAP then you can use glTexImage2D to supply whatever image you like for every level of scale, that being what the 'level' parameter dictates. So you can supply completely unrelated mip maps if you want.
To answer your specific points, the four-point filtering you mention is equivalent to box-filtering. This is less blurry than higher-order filters, but can result in aliasing patterns. One of the best filters is the Lanczos filter. I suggest you calculate all of your mipmap levels from the base texture using a Lanczos filter and crank up the anisotropic filtering settings on your graphics card.
I assume that the original code managed textures itself because it was designed to view data sets that are too large to fit into graphics memory. This was probably a bigger problem in the past, but is still a concern.