I've been looking everywhere and all I can find are tutorials on writing the shaders. None of them showed me how to incorporate them into my scene.
So essentially:
Given an hlsl shader, if I were to have a function called drawTexturedQuad() and I wanted the shader to be applied to the result, how exactly could I do this?
Thanks
ID3DXEffect provides Begin() and BeginPass() methods. Simply call drawQuad() during that time. Any basic tutorial on shaders should show such a sample.
Just an additional note- if in doubt, ask MSDN.
The answer to this is surprisingly complex, and has been getting more difficult as the GPU hardware has been getting more and more powerful. The D3DX FX system is an example of all the work that needs to be done, so using that is a good step to just getting things working for short-term usage.
Shaders are code, but they live on another machine from the CPU, so need all of their data marshalled over. The fixed parts: basic render states like depth states, stencil states, blending modes, drawing commands; are extremely easy to implement. The hard part is making a bridge for the programmable parts: shaders, buffers, samplers, and textures.
Index buffers just work, since you can only have one, or none in the case of rendering un-indexed geometry.
Vertex buffers are more or less fairly easy to deal with, since the hardware can be programmed to read the vertex buffers procedurally. Your vertex buffers only needs to provide at least as much information as the vertex shader wants to access. Modifications to the shader's vertex input, or the vertex format requiring editing both sides at the same time, and so is reasonably easy to work with.
Samplers and Textures are the next 'easier' of the hard parts to hook: they have a variable name and a rather rigid type. For instance, When compiling shader 'foo', texture 'myNormalMap', is assigned texture slot 3. You need to look up (via the reflection APIs) which slot the texture was assigned, and set the texture your engine considers 'myNormalMap' to be to slot 3 at runtime, and of course also use the API to determine if the texture is even needed in the first place. This is where starting to have naming conventions for shader variables starts to matter, so multiple shaders can be made compatible with the same C++ code.
Constant buffers (or raw shader constants in D3D9) are a lot trickier, especially with a programmable shader framework like you can find in engines like Unreal. The constants any given shader uses is a subset of the full list, but the C++ side must generally be written as if all of them are needed. The reflection APIs again are needed to determine not only which variables are actually referenced in a shader, but where they are located. This became a bit more manageable in D3D10 and newer as the cbuffers are structs and less fluid than the D3D9 system which was heavily limited by register count, but it also adds the step of also needing to use the reflection APIs to determine the order of cbuffer bindings (and which cbuffers themselves are also referenced).
In the end there is one design to make it all work:
Make a class that drives a specific shader archetype. For each variable this class exposes to a shader (be it a texture, constant buffer, etc), look up in the reflection info if it is used, and find out its location, and set it. Keeping this fast, flexible, and extensible is all a difficult challenge.
Related
I did a bit of searching, but the only results I found regarded anti-aliasing and the mechanics behind it.
I want to use a multi-sample buffer for things other than anti-aliasing. Specifically, what I have in mind is a form of order-independent transparency, storing a different color and depth value in each sample (ideally the N fragments closest to the camera).
Is this even possible in the first place? Has it been done before, and if so how? Even if it were possible, is it any more memory-efficient than just allocating another N framebuffers?
If something like this would require a compute shader or OpenCL, that's fine, I'm just curious to see if it's possible in the first place.
This question isn't specific to OpenGL or DirectX, since the hardware would be the same in either case.
PS. Please don't just point me towards other methods of order-independent transparency, this question is specifically about whether the multisample buffer can be used for atypical means.
The big problem is that the system is hard-coded to use multisampling for, well, multisampling, not for the storage of arbitrary data.
Through the use of the gl_SampleMask output, you can direct the results of a fragment shader to a specific sample of the multisample buffer. You could then use a shader to perform a custom multisample resolve in order to do your transparency sorting or whatever it is you plan to do here.
In theory, of course. In practice however, gl_SampleMask will be logically and-ed with the sample mask computed by the rasterizer for that fragment. So if you set the gl_SampleMask to write to a particular sample that isn't covered by your fragment, nothing gets written.
There's nothing you can do about this. Even explicitly activating per-sample fragment shader evaluation will not help you, since the system will still compute and use that mask, based on the geometry you render with.
So generally speaking, no, there's not much you can use multisampling for besides antialiasing.
I've been using OpenGL in a pretty basic way to draw textured quads for various 2D graphics projects. I've been using glBegin() and glEnd() to draw the two triangles that make up each textured quad, but I know that it's also possible to draw shapes with a vertex array.
However, the tutorials I've found seemed geared towards 3d graphics and involve loading shaders and such. All I need to do (for now at least) is draw textured quads, so this seems like overkill.
First of all, how much advantage is there to using vertex arrays in 2d? If there is advantage, what is the simplest way to use it?
how much advantage is there to using vertex arrays in 2d?
The advantage is huge in terms of performance. Vertex arrays greatly reduce the amount of API calls. This increases your rendering performance.
A disadvantage is that it is more complicated overall.
If there is advantage, what is the simplest way to use it?
Vertex arrays can be used for many different forms of rendering. For basic straight up rendering it is usually in the following structure:
//initializing
glGenBuffer
glBindBuffer
glBufferData
//Rendering
glBindBuffer
glVertexAttribPointer
glDrawArrays
There are lots of tutorials online.
The glBegin()/glEnd() approach is typically referred to as "immediate mode". Vertex arrays can be massively more efficient than immediate mode, particularly when they are stored in buffers (VBO = Vertex Buffer Object). This may not be a big deal as long as your drawing involves very few vertices, but becomes critical when you're dealing with geometry with a large number of vertices.
The main reasons why arrays are more efficient than immediate mode are:
They require a lot fewer OpenGL API calls, since you specify entire arrays with a single call, instead of making a call for each vertex. This adds up when you have millions of vertices.
The data can be specified once, and then reused in each frame as long as it does not change. In combination with VBOs, the vertex data can also be stored in memory that the GPU can access very efficiently.
There is another aspect. Immediate mode is not available anymore in most OpenGL versions:
It was never available in OpenGL ES, which is used on mobile devices.
It has been marked as deprecated in desktop OpenGL starting in version 3.0, which was released in 2008.
It is not available in the core profile, which was introduces with version 3.2.
The only way to still use immediate mode is with old OpenGL versions, or by using the compatibility profile that retains the deprecated features. The same is true for plain vertex arrays, only arrays in VBOs are supported in these newer versions.
The fixed function pipeline is equally deprecated, so writing your own shaders is required in all these newer OpenGL versions.
Some of this certainly adds complexity when starting out. It does require more code to get simple examples up and running. But once you crossed the initial hurdle, you gain a lot of flexibility, and much of the functionality actually becomes easier to use and understand.
An alternative is to use a higher level toolkit. There are plenty of options for graphics toolkits and game engines.
The OpenGL Graphics Pipeline is changing every year. So the programmable Pipelines are growing. At the end, as an opengl Programmer we create many little programms (Vertex, Fragment, Geometry, Tessellation, ..)
Why is there such a high specialization between the stages? Are they all running on a different part of the hardware? Why not just writing one code-block to describe what should be come out at the end instead of juggling between the stages?
http://www.g-truc.net/doc/OpenGL%204.3%20Pipeline%20Map.pdf
In this Pipeline PDF we see the beast.
In the days of "Quake" (the game), developers had the freedom to do anything with their CPU rendering implementations, they were in control of everything in the "pipeline".
With the introduction of fixed pipeline and GPUs, you get "better" performance, but lose a lot of the freedom. Graphics developers are pushing to get that freedom back. Hence, more customization pipeline everyday. GPUs are even "fully" programmable now using tech such as CUDA/OpenCL, even if it's not strictly about graphics.
On the other hand, GPU vendors cannot replace the whole pipeline with fully programmable one overnight. In my opinion, this boils down to multiple reasons;
GPU capabilities and cost; GPUs evolve with each iteration, it's
nonsense to throw away all the architecture you have and replace it
overnight, instead you add new features and enhancements every iteration, especially
when developers ask for it (example: Tessellation stage). Think of CPUs, Intel tried to replace the x86 architecture with Itanium, losing backward compatibility, having failed, they eventually copied what AMD did with AMDx64 architecture.
They also can't fully replace it due to legacy applications support, which are more widely used than someone might expect.
Historically, there were actually different processing units for the different programmable parts - there were Vertex Shader processors and Fragment Shader processors, for example. Nowadays, GPUs employ a "unified shader architecture" where all types of shaders are executed on the same processing units. That's why non-graphic use of GPUs such as CUDA or OpenCL is possible (or at least easy).
Notice that the different shaders have different inputs/outputs - a vertex shader is executed for each vertex, a geometry shader for each primitive, a fragment shader for each fragment. I don't think this could be easily captured in one big block of code.
And last but definitely far from least, performance. There are still fixed-function stages between the programmable parts (such as rasterisation). And for some of these, it's simply impossible to make them programmable (or callable outside of their specific time in the pipeline) without reducing performance to a crawl.
Because each stage has a different purpose
Vertex is to transform the points to where they should be on the screen
Fragment is for each fragment (read: pixel of the triangles) and applying lighting and color
Geometry and Tessellation both do things the classic vertex and fragment shaders cannot (replacing the drawn primitives with other primitives) and are both optional.
If you look carefully at that PDF you'll see different inputs and outputs for each shader/
Separating each shader stage also allows you to mix and match shaders beginning with OpenGL 4.1. For example, you can use one vertex shader with multiple different fragment shaders, and swap out the fragment shaders as needed. Doing that when shaders are specified as a single code block would be tricky, if not impossible.
More info on the feature: http://www.opengl.org/wiki/GLSL_Object#Program_separation
Mostly because nobody wants to re-invent the wheel if they do not have to.
Many of the specialized things that are still fixed-function would simply make life more difficult for developers if they had to be programmed from scratch to draw a single triangle. Rasterization, for instance, would truly suck if you had to implement primitive coverage yourself or handle attribute interpolation. It might add some novel flexibility, but the vast majority of software does not require that flexibility and developers benefit tremendously from never thinking about this sort of stuff unless they have some specialized application in mind.
Truth be told, you can implement the entire graphics pipeline yourself using compute shaders if you are so inclined. Performance generally will not be competitive with pushing vertices through the traditional render pipeline and the amount of work necessary would be quite daunting, but it is doable on existing hardware. Realistically, this approach does not offer a lot of benefits for rasterized graphics, but implementing a ray-tracing based pipeline using compute shaders could be a worthwhile use of time.
I wanted to use Opengl for 2d graphics because of its hardware acceleration. Would you rather recommend me to use modern Opengl or to use the fixed pipeline for this? This whole shader
writing seems like too much overhead to me. I just want to draw some 2d primitives.
Even for trivial 2d graphics using the programmable pipeline as opposed to the fixed function pipeline is what you want. In the end, the programmable pipeline gives you more freedom in expressing your graphics. How you decide to program the pipeline is up to you and is driven by your graphical needs. It could be, that you only need a single shader. There is no written rule that you need to have hundreds of shaders for it to be 'modern opengl'.
In that aspect it's debatable if modern opengl really is that much effort at all. It's a shader, a vertex/index buffers and a few textures. In comparison to fixed function pipeline, has it really changed that much that you even have to consider sticking to the fixed function pipeline?
A more compelling reason why you should prefer the programmable pipeline is that the fixed function pipeline is deprecated. In other words, pending removal. In concept a IHV could decide to drop support for it at any moment.
Modern OpenGL is better. Don't be afraid of shaders. Without them, you can't do anything besides draw the image using some blending... You can't go too far...
With shaders, you can do pretty much everything (effects like Photoshop, for example).
Why do people tend to mix deprecated fixed-function pipeline features like the matrix stack, gluPerspective(), glMatrixMode() and what not when this is meant to be done manually and shoved into GLSL as a uniform.
Are there any benefits to this approach?
There is a legitimate reason to do this, in terms of user sanity. Fixed-function matrices (and other fixed-function state tracked in GLSL) are global state, shared among all uniforms. If you want to change the projection matrix in every shader, you can do that by simply changing it in one place.
Doing this in GLSL without fixed function requires the use of uniform buffers. Either that, or you have to build some system that will farm state information to every shader that you want to use. The latter is perfectly doable, but a huge hassle. The former is relatively new, only introduced in 2009, and it requires DX10-class hardware.
It's much simpler to just use fixed-function and GLSL state tracking.
No benefits as far as I'm aware of (unless you consider not having to recode the functionality a benefit).
Most likely just laziness, or a lack of knowledge of the alternative method.
Essentially because those applications requires shaders to run, but programmers are too lazy/stressed to re-implement those features that are already available using OpenGL compatibility profile.
Notable features that are "difficult" to replace are the line width (greater than 1), the line stipple and separate front and back polygon mode.
Most tutorials teach deprecated OpenGL, so maybe people don't know better.
The benefit is that you are using well-known, thoroughly tested and reliable code. If it's for MS Windows or Linux proprietary drivers, written by the people who built your GPU and therefore can be assumed to know how to make it really fast.
An additional benefit for group projects is that There Is Only One Way To Do It. No arguments about whether you should be writing your own C++ matrix class and what it should be called and which operators to overload and whether the internal implementation should be a 1D or 2D arrary...