GLSL shaders crashng ATI/AMD GPU - opengl

I've been working on a project and the lab computers are packed with ATI series 6700. The project is in C++ with OpenGL, so for the shaders I use GLSL. It is basically a planet with Level of Detail, so the number of polygons being sent to the shaders depends on camera position and the closer you are the more polygons there are for better definition. Bear in mind that I have indeed put a limit on the level of detail, otherwise everything would crash when camera was too close, due to insane amount of subdivisions.
However, when running the program on the lab computers, there's weird twitching of the mesh (some vertices appearing at other positions) and most importantly, there's a drop in framerate when I get too close and eventually the whole thing freezes and both my screens turn black and when Windows is visible again there's a message saying the drivers have encountered a problem and had to recover.
I tried running my project on my personal laptop which has an nvidia GT555M and none of the above problems appear. No twitching, no drop in framerate and no crashing. Also, when I disable the shaders in the lab computers, the program doesn't crash, nor is there twitching, but there's still the drop in framerate. All of this makes me think it has something to do with the way NVidia and ATI interpret and compile GLSL.
First of all, I should say I'm only creating a VBO with 17x17 vertices, where each vertex has only 16 bytes, 3 floats for position and 4 bytes of padding for performance. So when creating the VBO and drawing it, I do not enable COLOR, TEXTURE or NORMAL, since there's only information about the vertex position.
On the GPU side, however, I might be using non-standard GLSL or something like that, since I've read that ATI cards can only work with standardized GLSL. So here's my vertex shader and fragment shader:
http://pastebin.com/DE6ijidq
http://pastebin.com/Q8xUAguw
Bear in mind that what the C++ does is create 6 grids of 16 by 16 squares, each for one side of the cube, but has them centered on the origin, so in the first line of the main I displace them accordingly to create a cube, then I use the exploding cube technique to make it a sphere and work on it. I don't specify a GLSL version and I'm using varying variables, which might be where the problem is. Is there a way to make this compatible with the ATI cards as well? Should I specify a version, if so, which one? And should I use in and out variables, but then how? I've been testing my shaders in RenderMonkey and for some reason it still uses varying.

Related

What is the difference between opengl and GLSL?

I recently started programming with openGL. I've done code creating basic primitives and have used shaders in webGL. I've googled the subject extensively but it's still not that clear to me. Basically, here's what I want to know. Is there anything that can be done in GLSL that can't be done in plain openGL, or does GLSL just do things more efficiently?
The short version is: OpenGL is an API for rendering graphics, while GLSL (which stands for GL shading language) is a language that gives programmers the ability to modify pipeline shaders. To put it another way, GLSL is a (small) part of the overall OpenGL framework.
To understand where GLSL fits into the big picture, consider a very simplified graphics pipeline.
Vertexes specified ---(vertex shader)---> transformed vertexes ---(primitive assembly)---> primitives ---(rasterization)---> fragments ---(fragment shader)---> output pixels
The shaders (here, just the vertex and fragment shaders) are programmable. You can do all sorts of things with them. You could just swap the red and green channels, or you could implement a bump mapping to make your surfaces appear much more detailed. Writing these shaders is an important part of graphics programming. Here's a link with some nice examples that should help you see what you can accomplish with custom shaders: http://docs.unity3d.com/Documentation/Components/SL-SurfaceShaderExamples.html.
In the not-too-distant past, the only way to program them was to use GPU assembler. In OpenGL's case, the language is known as ARB assembler. Because of the difficulty of this, the OpenGL folks gave us GLSL. GLSL is a higher-level language that can be compiled and run on graphics hardware. So to sum it all up, programmable shaders are an integral part of the OpenGL framework (or any modern graphics API), and GLSL makes it vastly easier to program them.
As also covered by Mattsills answer GL Shader Language or GLSL is a part of OpenGL that enables the creation of algorithms called shaders in/for OpenGL. Shaders run on the GPU.
Shaders make decisions about factors such as the color of parts of surfaces, and the way surfaces share information such as reflected light. Vertex Shaders, Geometry Shaders, Tesselation Shaders and Pixel Shaders are types of shader that can be written in GLSL.
Q1:
Is there anything that can be done in GLSL that can't be done in plain OpenGL?
A:
You may be able to use just OpenGL without the GLSL parts, but if you want your own surface properties you'll probably want a shader make this reasonably simple and performant, created in something like GLSL. Here are some examples:
Q2:
Or does GLSL just do things more efficiently?
A:
Pixel shaders specifically are very parallel, calculating values independently for every cell of a 2D grid, while also containing significant caveats, like not being unable to handle "if" statement like conditions very performantly, so it's a case of using different kinds of shaders to there strengths, on surfaces described and dealt with in the rest of OpenGL.
Q3:
I suspect you want to know if just using GLSL is an option, and I can only answer this with my knowledge of one kind of shader, Pixel Shaders. The rest of this answer covers "just" using GLSL as a possible option:
A:
While GLSL is a part of OpenGL, you can use the rest of OpenGL to set up the enviroment and write your program almost entirly as a pixel shader, where each element of the pixel shader colours a pixel of the whole screen.
For example:
(Note that WebGL has a tendency to hog CPU to the point of stalling the whole system, and Windows 8.1 lets it do so, Chrome seems better at viewing these links than Firefox.)
No, this is not a video clip of real water:
https://www.shadertoy.com/view/Ms2SD1
The only external resources fed to this snail some easily generatable textures:
https://www.shadertoy.com/view/ld3Gz2
Rendering using a noisy fractal clouds of points:
https://www.shadertoy.com/view/Xtc3RS
https://www.shadertoy.com/view/MsdGzl
A perfect sphere: 1 polygon, 1 surface, no edges or vertices:
https://www.shadertoy.com/view/ldS3DW
A particle system like simulation with cars on a racetrack, using a 2nd narrow but long pixel shader as table of data about car positions:
https://www.shadertoy.com/view/Md3Szj
Random values are fairly straightforward:
fract(sin(p)*10000.)
I've found the language in some respects to be hard to work with and it may or may not be particularly practical to use GLSL in this way for a large project such as a game or simulation, however as these demos show, a computer game does not have to look like a computer game and this sort of approach should be an option, perhaps used with generated content and/or external data.
As I understand it to perform reasonably Pixel Shaders in OpenGL:
Have to be loaded into a small peice of memory.
Do not support:
"if" statement like conditions.
recursion or while loop like flow control.
Are restricted to a small pool of valid instructions and data types.
Including "sin", mod, vector multiplication, floats and half precision floats.
Lack high level features like objects or lambdas.
And effectively calculate values all at once in parallel.
A consequence of all this is that code looks more like lines of closed form equations and lacks algorythms or higher level structures, using modular arithmetic for something akin to conditions.

Deferred Rendering with Tile-Based culling Concept Problems

EDIT: I'm still looking for some help about the use of OpenCL or compute shaders. I would prefer to keep using OGL 3.3 and not have to deal with the bad driver support for OGL 4.3 and OpenCL 1.2, but I can't think of anyway to do this type of shading without using one of the two (to match lights and tiles). Is it possible to implement tile-based culling without using GPGPU?
I wrote a deferred render in OpenGL 3.3. Right now I don't do any culling for the light pass (I just render a full screen quad for every light). This (obviously) has a ton of overdraw. (Sometimes it is ~100%). Because of this I've been looking into ways to improve performance during the light pass. It seems like the best way in (almost) everyone's opinion is to cull the scene using screen space tiles. This was the method used in Frostbite 2. I read the the presentation from Andrew Lauritzen during SIGGRAPH 2010 (http://download-software.intel.com/sites/default/files/m/d/4/1/d/8/lauritzen_deferred_shading_siggraph_2010.pdf) , and I'm not sure I fully understand the concept. (and for that matter why it's better than anything else, and if it is better for me)
In the presentation Laurtizen goes over deferred shading with light volumes, quads, and tiles for culling the scene. According to his data, the tile based deferred renderer was the fastest (by far). I don't understand why it is though. I'm guessing it has something to do with the fact that for each tile, all the lights are batched together. In the presentation it says to read the G-Buffer once and then compute the lighting, but this doesn't make sense to me. In my mind, I would implement this like this:
for each tile {
for each light effecting the tile {
render quad (the tile) and compute lighting
blend with previous tiles (GL_ONE, GL_ONE)
}
}
This would still involve sampling the G-Buffer a lot. I would think that doing that would have the same (if not worse) performance than rendering a screen aligned quad for every light. From how it's worded though, it seems like this is what's happening:
for each tile {
render quad (the tile) and compute all lights
}
But I don't see how one would do this without exceeding the instruction limit for the fragment shader on some GPUs . Can anyone help me with this? It also seems like almost every tile based deferred renderer uses compute shaders or OpenCL (to batch the lights), why is this, and if I didn't use these what would happen?
But I don't see how one would do this without exceeding the instruction limit for the fragment shader on some GPUs .
It rather depends on how many lights you have. The "instruction limits" are pretty high; it's generally not something you need to worry about outside of degenerate cases. Even if 100+ lights affects a tile, odds are fairly good that your lighting computations aren't going to exceed instruction limits.
Modern GL 3.3 hardware can run at least 65536 dynamic instructions in a fragment shader, and likely more. For 100 lights, that's still 655 instructions per light. Even if you take 2000 instructions to compute the camera-space position, that still leaves 635 instructions per light. Even if you were doing Cook-Torrance directly in the GPU, that's probably still sufficient.

OpenGL: Is it more efficient to use GL_QUADS or GL_TRIANGLES?

I know that OpenGL deprecated and got rid of GL_QUADS in the newer releases. I have heard this is due to the fact that modern GPUs only render with triangles so calling a quad would just make the GPU work harder to break it into two triangles (what I have heard anyway, I am not much of an expert on any of this topic).
I was wondering whether or not it is better (assuming the average person's CPU is faster, relatively, than their GPU) to just manually break the rendering of quads into two triangles yourself or to just let the GPU do it itself. Again, I have absolutely no real experience with OpenGL as I am just starting. I would rather know which is better for most machines these days so I could focus my attention on either rendering method*. Thanks.
*Yet I will probably utilize the 'triangle method' for the sake of it.
Even if you feed OpenGL quads, the triangularization is done by the driver on the CPU side before it even hits the GPU. Modern GPUs these days eat nothing except triangles. (Well, and lines and points.) So something will be triangulating, whether it's you or the driver -- it doesn't matter too much where it happens.
This would be less efficient if, say, you don't reuse your vertex buffers, and instead refill them anew every time with quads (in which case the driver will have to retriangulate every vertex buffer), instead of refilling them with pretriangulated triangles every time, but that's pretty contrived (and the problem you should be fixing in that case is just the fact you're refilling your vertex buffers).
I would say, if you have the choice, stick with triangles, since that's what most content pipelines put out anyways, and you're less likely to run into problems with non-planar quads and the like. If you get to choose what format your content comes in, then use triangles for sure, and the triangulation step gets skipped altogether.
Any geometry can be represented with triangles, and that is why it was decided to use triangles instead of quads. Another reason is two triangles do not have to be co-planar, which is not true for quad.
Yes, you select to render quads, but the driver will converting the quad into two triangles.
Therefore, by choosing to render a quad will not make GPU work less, but will make your CPU work more, because it has to do the conversion.

Quad textured with big texture doesn't show up

I've run into some issue with drawing a texture. Situation is as follows:
I've got linux box with ati hardware and propietary ati driver, which is two or three years old - because of ati ditching old hw. I've got custom application with dedicated (mostly)2D engine based on opengl. (It was build over years and is quite mature, and never
got problems like this)
The problem happens, when vram (which is taken from system memory, 2GB in this particular case) is almost to maximum filed with textures. When in scene there is a quad, which is textured with texture over 2048x2048 it is not drawn. When I do timing of particular surfaces, the surface which takes most time to draw is not the one being textured with big tex (takes about 87 us), but the next one drawn after it (takes ~900 ms!).
The scene being drawn, doesn't use all the textures from vram, but only, let's say: 8%. Unfortunatelly, I cannot free even a small part of it. The application is usually working under that kind of vram-stressed conditions, and never behaved like this.
glGetError() returns nothing.
All other textures are drawn normally.

How big can a WebGL fragment shader be?

I'm raytracing in the WebGL fragment shader, planning on using a dynamically generated fragment shader that contains the objects in my scene. As I add an object to my scene, I will be adding some lines to the fragment shader, so it could get pretty large. How big can it get and still work? Is it dependent on the graphics card?
The "safe" answer is that it depends on your hardware and drivers, but in practical terms they can be pretty crazy big. What JustSid said about performance does apply (bigger shader == slower shader) but it sounds like you're not exactly aiming for 60FPS here.
For a more in-depth breakdown of shader limits, check out http://en.wikipedia.org/wiki/High_Level_Shader_Language. The page is about Direct X shaders, but all the shader constraints apply to GLSL as well.
There are all kinds of limits on how big shaders can be. It's up the driver/gpu but it's also up to the machine and WebGL. Some WebGL implementations (chrome) run an internal timer. If a single GL command takes too long they'll kill WebGL ("Rats, WebGL hit a snag"). This can happen when a shader takes too long to compile.
Here's an example of a silly 100k shader where I ran a model through a script to generate a mesh in the shader itself. It runs for me on macOS on my Macbook Pro. It also runs on my iPhone6+. But when I try it in Windows the DirectX driver takes too long trying to optimize it and Chrome kills it.
It's fun to goof off and put geometry in your shader and it's fun to goof off with signed distance fields and ray marching type stuff but those techniques are more of a puzzle / plaything. They are not in any way performant. As a typical example this SDF based shader runs at about 1 frame per second fullscreen on my laptop, and yet my laptop is capable of displaying all of Los Santos from GTV5 at a reasonable framerate.
If you want performance you should really be using more standard techniques, putting your geometry in buffers and doing forward or deferred rendering.
You don't want to create massive fragment shaders since they tend to drain the performance very fast. You should, if possible, do any calculation either already on the CPU or in the vertex shader, but not for every pixel.