I've read some papers and it says like they can detect silhouette,edge,ridge and draw a line to it using GLSL shader. But in the implementation they says that they 'accessed' neighbouring pixel and do something. How can that even possible?
This is the paper in question
http://www.google.co.th/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=0CDEQFjAC&url=http%3A%2F%2Fcg.postech.ac.kr%2Fresearch%2Fline_drawings_via_abstracted_shading%2Fline-drawing-s07.pdf&ei=I5GeUO-pMcKzrAf8_4DICw&usg=AFQjCNE7D9nMVKWvYwvNUaHo5S1ZfrG10A&sig2=CmnD6hbD6-0EYkvv-Bj3LQ
It's on section 3, Rendering Lines. They initially said about GLSL shader but then they suddenly talk about sample the groups of pixel.
I'm studying about non-photorealistic rendering without image processing after I render it. So GPU usage can be optimum if it was done in GLSL shaders.
Without reading the paper, I assume they probably mean gather reads from a texture, not scatter writes to the framebuffer, which has always been possible with shaders. Since OpenGL-4 it's even possible to do scatter writes from a shader, called image writing, but it's rather slow. Anyway, for line detection you only need gather reads, so this is not a problem.
Related
I would like to retrieve the z height of each pixels of a rendered object in a scene.
I will need to retrieve the color rendered too.
What are the opengl technics to implement ?
glReadPixels and CPU side code
use glReadPixels to obtain both RGB and Depth buffers. Here examples for both:
depth buffer got by glReadPixels is always 1
OpenGL Scale Single Pixel Line
That will read the buffers into CPU accessible memory. This way is slow (due to sync) but should work on any platform.
FBO render to texture and GPU shader
Faster method is to use FBO and render to texture and use that output in next rendering pass as input texture for computing your stuff inside shaders. This however will not run properly on Intel and might need additional tweaking of code between nVidia and AMD.
If you have per pixel output use single QUAD covering your screen as the second rendering pass.
If you got single output for the whole screen instead use single POINT render and compute all in the fragment shader (scann the whole texture inside) something like this:
How to implement 2D raycasting light effect in GLSL
The difference is that by usnig shaders and FBO you are not transferring data between GPU/CPU so its way faster.
The content of the targeted textures can be still readed by CPU using texture related GL functions
compute GPU shaders
There are also compute shaders out there but I did not use them yet so I am just guessing however with them it might be possible to do your stuff in single pass and also the form of the result and computation should not be as limiting.
My bet is that you are doing some post processing similar to Deferred Shading so googling such topic/tutorials might help.
Has anyone familiar with some sort of OpenGL magic to get rid of calculating bunch of pixels in fragment shader instead of only 1? Especially this issue is hot for OpenGL ES in fact meanwile flaws mobile platforms and necessary of doing things in more accurate (in performance meaning) way on it.
Are any conclusions or ideas out there?
P.S. it's known shader due to GPU architecture organisation is run in parallel for each texture monad. But maybe there techniques to raise it from one pixel to a group of ones or to implement your own glTexture organisation. A lot of work could be done faster this way within GPU.
OpenGL does not support writing to multiple fragments (meaning with distinct coordinates) in a shader, for good reason, it would obstruct the GPUs ability to compute each fragment in parallel, which is its greatest strength.
The structure of shaders may appear weird at first because an entire program is written for only one vertex or fragment. You might wonder why can't you "see" what is going on in neighboring parts?
The reason is an instance of the shader program runs for each output fragment, on each core/thread simultaneously, so they must all be independent of one another.
Parallel, independent, processing allows GPUs to render quickly, because the total time to process a batch of pixels is only as long as the single most intensive pixel.
Adding outputs with differing coordinates greatly complicates this.
Suppose a single fragment was written to by two or more instances of a shader.
To ensure correct results, the GPU can either assign one to be an authority and ignore the other (how does it know which will write?)
Or you can add a mutex, and have one wait around for the other to finish.
The other option is to allow a race condition regarding whichever one finishes first.
Either way this would immensely slows down the process, make the shaders ugly, and introduces incorrect and unpredictable behaviour.
Well firstly you can calculate multiple outputs from a single fragment shader in OpenGL 3 and up. A framebuffer object can have more than one RGBA surfaces (Renderbuffer Objects) attached and generate an RGBA for each of them by using gl_FragData[n] instead of gl_FragColor. See chapter 8 of the 5th edition OpenGL SuperBible.
However, the multiple outputs can only be generated for the same X,Y pixel coordinates in each buffer. This is for the same reason that an older style fragment shader can only generate one output, and can't change gl_FragCoord. OpenGL guarantees that in rendering any primitive, one and only one fragment shader will write to any X,Y pixel in the destination framebuffer(s).
If a fragment shader could generate multiple pixel values at different X,Y coords, it might try to write to the same destination pixel as another execution of the same fragment shader. Same if the fragment shader could change the pixel X or Y. This is the classic multiple threads trying to update shared memory problem.
One way to solve it would be to say "if this happens, the results are unpredictable" which sucks from the programmer point of view because it's completely out of your control. Or fragment shaders would have to lock the pixels they are updating, which would make GPUs far more complicated and expensive, and the performance would suck. Or fragment shaders would execute in some defined order (eg top left to bottom right) instead of in parallel, which wouldn't need locks but the performance would suck even more.
Is there a way to read fragment from the framebuffer currently rendered?
So, I'm looking for a way to read color information from the fragment that's on the place that current fragment will probably overwrite. So, exact position of the fragment that previously rendered.
I found gl_FragData and gl_LastFragData to be added with certain EXT_ extensions to shaders, but if they are what I need, could somebody explain how to use those?
I am looking either for a OpenGL or OpenGL ES 2.0 solution.
EDIT:
All the time I was searching for the solution that would allow me to have some kind of read&write "uniform" accessible from shaders. For anyone out there searching for similar thing, OpenGL version 4.3+ support image and buffer storage types. They do allow both reading and writing to them simultaneously, and in combination with compute shaders they proved to be very powerful tool.
Your question seems rather confused.
Part of your question (the first sentence) asks if you can read from the framebuffer in the fragment shader. The answer is, generally no. There is an OpenGL ES 2.0 extension that lets you do so, but it's only supported on some hardware. In desktop GL 4.2+, you can use arbitrary image load/store to get the same effect. But you can't render to that image anymore; you have to write your data using image storing functions.
gl_LastFragData is pretty simple: it's the color from the sample in the framebuffer that will be overwritten by this fragment shader. You can do with it what you wish, if it is available.
The second part of your question (the second paragraph) is a completely different question. There, you're asking about fragments that were potentially never written to the framebuffer. You can't read from a fragment shader; you can only read images. And if a fragment fails the depth test, then it's data was never rendered to an image. So you can't read it.
With most nVidia hardware you can use the GL_NV_texture_barrier extension to read from a texture that's currently bound to a framebuffer. But bear in mind that you won't be able to read data any more recent than produced in the previous draw call
I am very interested in understanding how multisampling works. I have found a large literature on how to enable or use it, but very little information concerning what it really does in order to achieve an antialiased rendering. What I have found, in many places, is conflicting information that only confused me more.
Please note that I know how to enable and use multisampling (I actually already use it), what I don't know is what kind of data really gets into the multisampled renderbuffers/textures, and how this data is used in the rendering pipeline.
I can understand very well how supersampling works, but multisampling still has some obscure areas that I would like to understand.
here is what the specs say: (OpenGL 4.2)
Pixel sample values, including color, depth, and stencil values, are stored in this
buffer (the multisample buffer). Samples contain separate color values for each fragment color.
...
During multisample rendering the contents of a pixel fragment are changed
in two ways. First, each fragment includes a coverage value with SAMPLES bits.
...
Second, each fragment includes SAMPLES depth values and sets of associated
data, instead of the single depth value and set of associated data that is maintained
in single-sample rendering mode.
So, each sample contains a distinct color, coverage bit, and depth. What's the difference from a normal supersampling? Seems like a "weighted" supersampling to me, where each final pixel value is determined by the coverage value of its samples instead of a simple average, but I am very unsure about this. And what about texture coordinates at sample level?
If I store, say, normals in a RGBF multisampled texture, will I read them back "antialiased" (that is, approaching 0) on the edges of a polygon?
A fragment shader is called once per fragment, unless it uses gl_SampleID, glSampleIn or has a 'sample' storage qualifier. How can a fragment shader be invoked once per fragment and get an antialiased rendering?
OpenGL on Silicon Graphics Systems:
http://www-f9.ijs.si/~matevz/docs/007-2392-003/sgi_html/ch09.html#LE68984-PARENT
mentions: When you use multisampling and read back color, you get the resolved color value (that is, the average of the samples). When you read back stencil or depth, you typically get back a single sample value rather than the average. This sample value is typically the one closest to the center of the pixel.
And there's this technical spec (1994) from the OpenGL site. It explains in full detail what is done If MULTISAMPLE_SGIS is enabled: http://opengl.org/registry/specs/SGIS/multisample.txt
See also this related question: How are depth values resolved in OpenGL textures when multisampling?
And the answers to this question, where GL_MULTISAMPLE_ARB is recommended: where is GL_MULTISAMPLE defined?. The specs for GL_MULTISAMPLE_ARB (2002) are here: http://www.opengl.org/registry/specs/ARB/multisample.txt
I was wondering if there is support in the newer shader models to read-back a pixel value from the target framebuffer. I assume that this is alrdy done in later (non-programmable) stages in the drawing pipeline which made me hope that this feature might have been added into the programmable pipeline.
I am aware that it is possible to draw to a texture bound framebuffer and then send this texture to the shader, I was just hoping for a more elegant way to achieve the same functionality.
As Andrew notes, the framebuffer access is logically a separate stage from the fragment shader, so reading the framebuffer in the fragment shader is impossible. The reason for this (to answer Andrew's question) is a combination of performance and the ordering requirements of the graphics pipeline. The way the rendering pipeline is defined, framebuffer blending operations MUST occur in the same order as the triangles/primitives that went into the beginning of the pipeline. The fragment shaders, on the other hand, can happen in any order. So by having them be separate stages, the GPU is free to run fragment shaders as fast as it can, as their inputs become available, without having to synchronize between them. As long as it maintains enough bufffer space to hold on to the outputs of the fragment shaders, so that they can be accumulated and allow the framebuffer blends and writes to occur in order, all is well, as the results of any given fragment shader are not visible until after the blending stage.
If there was a way for the fragment shader to read the framebuffer, it would require some sort of synchronization to ensure that those reads happen in order, thus greatly slowing things down.
No. As you mention, rendering to a texture is the way to achieve that functionality.
If you take a look at a block diagram of a GPU pipeline, you'll see that the blending stage - which is what combines fragment shader output with the framebuffer - is separate from the fragment shader and is fixed-function.
I'm not a GPU designer - so I can only speculate the reason for this. Presumably it is to keep framebuffer access fast and insulate the fragment shader stage from the frame buffer so that it can be better parallelised. There are probably also issues regarding multi-sampling, and so on.
(Not to mention that fixed-function blending is "good enough" in most cases.)
Actually I think this is now doable with Direct3D 11 SM 5.0 (I didn't test it though).
You can bind an UAV to a PS 5.0, for allowing read and write operations on it using method OMSetRenderTargetsAndUnorderedAccessViews.
In that case the backbuffer of the swap chain in which you render has to be created with flag DXGI_USAGE_UNORDERED_ACCESS (I guess).
This is used in DXSDK OIT11 sample.
It is possible to read back the contents of the frame buffer in the fragment shader with Shader_framebuffer_fetch extension. The support can be added to the GPU with some performance loss. In fact, these days I'm working on to add the support of this extension in the OpenGL ES2.0 driver of a well known GPU brand in the consumer electronics market.
You can draw to a texture TEX (using a render target view) and then bind that as an input to another shader (using a shader resource view). TEX is then a pseduo-framebuffer.