Scenekit SCNShadable accessing _surface.diffuseTexcoord paints the object white - glsl

I'm experiencing odd behaviour when trying to extend a PBR material from scenekit. All i want to do is read a texture, map it using the first uv channel (same as normal). As soon as i mention _surface.diffuseTexcoord, _surface.diffuse seems to turn to white. It doesn't seem to be constant (_output.rgb = vec3(1.)) but rather color white is passed through the lighting pipeline.
let myShader = "#pragma arguments\n" +
"sampler uMaskTex;\n" +
"uniform sampler2D uMaskTex;\n" +
"#pragma body\n" +
//"vec2 myDuv = _surface.diffuseTexcoord;\n" //paints the object white
OR
//"vec3 mask = texture2D( uMaskTex, _surface.diffuseTexcoord ).xyz;\n" //paints the object white
OR
"vec3 mask = texture2D( uMaskTex, vec2(1.) ).xyz;\n" //does not paint the object white
I do not understand at all why are these channels tied to "ambient", "diffuse", "specular" etc, when they exist in the vertex shader at _geometry.texcoords[MY_CHANNEL].
My attempt to transfer _geometry.texcoords[0] via a varying vec2 v_my_abstract_uv_channel_not_diffuse_necesserily ended in failure.
edit
After doing some voodoo magic i managed to get this thing working the way i want to but still have no idea what is causing this behaviour.
Note, i do not have a texture in the diffuse slot for the material, it is set to constant color. Thought that something might be getting filled with garbage, but oddly, _surface.diffuseTexcoord does contain values and they are correct. It's just that any mention of it causes everything to turn white.
I am however using a normal texture, assigned through the gui.
I've guessed that there is a corresponding _surface.normalTexcoord even though its not documented and this seems to have worked. Even though it contains the same values, it doesn't cause the material to go white.
"#pragma arguments\n" +
"sampler uMaskTex;\n" +
"uniform sampler2D uMaskTex;\n" +
//"varying vec2 vTexCoord0;\n" + //varyings don't work :( ,
"#pragma body\n" +
"vec3 mask = texture2D(uMaskTex,_surface.diffuseTexcoord).xyz;\n" + //does not work, even though i don't do anything with this variable, it affects _surface.diffuse
"_output.color.rgb = vec3(_surface.diffuseTexcoord,1.);\n" + //actually contains values, not garbage as i thought
"_output.color.rgb = _surface.diffuse.rgb;\n"; //does not work, this outputs white, even though a different color is set
"_output.color.rgb = mask;\n"; //i do however get the texture to map correctly, i do see the texture with _surface.diffuseTexcoord lookup... ?!
On the other hand:
"#pragma body\n" +
"vec3 mask = texture2D(uMaskTex,_surface.normalTexcoord).xyz;\n" + //works!
//works! even though there is no mention of this in the documentation.
//Gives exactly the same values as _surface.diffuseTexcoord
"_output.color.rgb = vec3( _surface.normalTexcoord , 1.);\n" +
"_output.color.rgb = _surface.diffuse.rgb;\n"; //works! no gremlins, doesn't turn to white, shows the color from gui!

Related

Stuck trying to optimize complex GLSL fragment shader

So first off, let me say that while the code works perfectly well from a visual point of view, it runs into very steep performance issues that get progressively worse as you add more lights. In its current form it's good as a proof of concept, or a tech demo, but is otherwise unusable.
Long story short, I'm writing a RimWorld-style game with real-time top-down 2D lighting. The way I implemented rendering is with a 3 layered technique as follows:
First I render occlusions to a single-channel R8 occlusion texture mapped to a framebuffer. This part is lightning fast and doesn't slow down with more lights, so it's not part of the problem:
Then I invoke my lighting shader by drawing a huge rectangle over my lightmap texture mapped to another framebuffer. The light data is stored in an array in an UBO and it uses the occlusion mapping in its calculations. This is where the slowdown happens:
And lastly, the lightmap texture is multiplied and added to the regular world renderer, this also isn't affected by the number of lights, so it's not part of the problem:
The problem is thus in the lightmap shader. The first iteration had many branches which froze my graphics driver right away when I first tried it, but after removing most of them I get a solid 144 fps at 1440p with 3 lights, and ~58 fps at 1440p with 20 lights. An improvement, but it scales very poorly. The shader code is as follows, with additional annotations:
#version 460 core
// per-light data
struct Light
{
vec4 location;
vec4 rangeAndstartColor;
};
const int MaxLightsCount = 16; // I've also tried 8 and 32, there was no real difference
layout(std140) uniform ubo_lights
{
Light lights[MaxLightsCount];
};
uniform sampler2D occlusionSampler; // the occlusion texture sampler
in vec2 fs_tex0; // the uv position in the large rectangle
in vec2 fs_window_size; // the window size to transform world coords to view coords and back
out vec4 color;
void main()
{
vec3 resultColor = vec3(0.0);
const vec2 size = fs_window_size;
const vec2 pos = (size - vec2(1.0)) * fs_tex0;
// process every light individually and add the resulting colors together
// this should be branchless, is there any way to check?
for(int idx = 0; idx < MaxLightsCount; ++idx)
{
const float range = lights[idx].rangeAndstartColor.x;
const vec2 lightPosition = lights[idx].location.xy;
const float dist = length(lightPosition - pos); // distance from current fragment to current light
// early abort, the next part is expensive
// this branch HAS to be important, right? otherwise it will check crazy long lines against occlusions
if(dist > range)
continue;
const vec3 startColor = lights[idx].rangeAndstartColor.yzw;
// walk between pos and lightPosition to find occlusions
// standard line DDA algorithm
vec2 tempPos = pos;
int lineSteps = int(ceil(abs(lightPosition.x - pos.x) > abs(lightPosition.y - pos.y) ? abs(lightPosition.x - pos.x) : abs(lightPosition.y - pos.y)));
const vec2 lineInc = (lightPosition - pos) / lineSteps;
// can I get rid of this loop somehow? I need to check each position between
// my fragment and the light position for occlusions, and this is the best I
// came up with
float lightStrength = 1.0;
while(lineSteps --> 0)
{
const vec2 nextPos = tempPos + lineInc;
const vec2 occlusionSamplerUV = tempPos / size;
lightStrength *= 1.0 - texture(occlusionSampler, vec2(occlusionSamplerUV.x, 1 - occlusionSamplerUV.y)).x;
tempPos = nextPos;
}
// the contribution of this light to the fragment color is based on
// its square distance from the light, and the occlusions between them
// implemented as multiplications
const float strength = max(0, range - dist) / range * lightStrength;
resultColor += startColor * strength * strength;
}
color = vec4(resultColor, 1.0);
}
I call this shader as many times as I need, since the results are additive. It works with large batches of lights or one by one. Performance-wise, I didn't notice any real change trying different batch numbers, which is perhaps a bit odd.
So my question is, is there a better way to look up for any (boolean) occlusions between my fragment position and light position in the occlusion texture, without iterating through every pixel by hand? Could render buffers perhaps help here (from what I've read they're for reading data back to system memory, I need it in another shader though)?
And perhaps, is there a better algorithm for what I'm doing here?
I can think of a couple routes for optimization:
Exact: apply a distance transform on the occlusion map: this will give you the distance to the nearest occluder at each pixel. After that you can safely step by that distance within the loop, instead of doing baby steps. This will drastically reduce the number of steps in open regions.
There is a very simple CPU-side algorithm to compute a DT, and it may suit you if your occluders are static. If your scene changes every frame, however, you'll need to search the literature for GPU side algorithms, which seem to be more complicated.
Inexact: resort to soft shadows -- it might be a compromise you are willing to make, and even seen as an artistic choice. If you are OK with that, you can create a mipmap from your occlusion map, and then progressively increase the step and sample lower levels as you go farther from the point you are shading.
You can go further and build an emitters map (into the same 4-channel map as the occlusion). Then your entire shading pass will be independent of the number of lights. This is an equivalent of voxel cone tracing GI applied to 2D.

Pass array of floats to fragment shader via texture

I am trying to pass an array of floats (in my case an audio wave) to a fragment shader via texture. It works but I get some imperfections as if the value read from the 1px height texture wasn't reliable.
This happens with many combinations of bar widths and amounts.
I get the value from the texture with:
precision mediump float;
...
uniform sampler2D uDisp;
...
void main(){
...
float columnWidth = availableWidth / barsCount;
float barIndex = floor((coord.x-paddingH)/columnWidth);
float textureX = min( 1.0, (barIndex+1.0)/barsCount );
float barValue = texture2D(uDisp, vec2(textureX, 0.0)).r;
...
If instead of the value from the texture I use something else the issue doesn't seem to be there.
barValue = barIndex*0.1;
Any idea what could be the issue? Is using a texture for this purpose a bad idea?
I am using Pixi.JS as WebGL framework, so I don't have access to low level APIs.
With a gradient texture for the data and many bars the problems becomes pretty evident.
Update: Looks like the issue relates to the consistency of the value of textureX.
Trying different formulas like barIndex/(barsCount-1.0) results in less noise. Wrapping it on a min definitely adds more noise.
Turned out the issue wasn't in reading the values from the texture, but was in the drawing. Instead of using IFs I switched to step and the problem went away.
vec2 topLeft = vec2(
paddingH + (barIndex*columnWidth) + ((columnWidth-barWidthInPixels)*0.5),
top
);
vec2 bottomRight = vec2(
topLeft.x + barWidthInPixels,
bottom
);
vec2 tl = step(topLeft, coord);
vec2 br = 1.0-step(bottomRight, coord);
float blend = tl.x * tl.y * br.x * br.y;
I guess comparisons of floats through IFs are not very reliable in shaders.
Generally mediump is insufficient for texture coordinates for any non-trivial texture, so where possible use highp. This isn't always available on some older GPUs, so depending on the platform this may not solve your problem.
If you know you are doing 1:1 mapping then also use GL_NEAREST rather than GL_LINEAR, as the quantization effect will more likely hide some of the precision side-effects.
Given you probably know the number of columns and bars you can probably pre-compute some of the values on the CPU (e.g. precompute 1/columns and pass that as a uniform) at fp32 precision. Passing in small values between 0 and 1 is always much better at preserving floating point accuracy, rather than passing in big values and then dividing out.

Rendering point cloud data with draw instancing from OSG Cookbook not working

I am rendering a point cloud using OSG. I followed the example in the OSG cookbook titled "Rendering point cloud data with draw instancing" that shows how to make one point with many instances and then transfer the point locations to the graphics card via a texture. It then uses a shader to pull the points out of the texture and move each instance to the right location. There appear to be two problems with what is getting rendered.
First, the points aren't in the right location compared to a more straight forward, working approach to rendering. It looks like they are roughly scaled from zero wrong, some kind of multiplicative factor on position.
Second, the imagery is blurry. Points tend to be generally in the right place; there are many points in the place where a large object should be. However, I can't tell what the object. Data rendered with my working (but slower) rendering method looks sharp.
I have verified that I have the same input data going into the texture and draw list in both methods so it seems it has to be something with the rendering.
Here is the code to set up the Geometry which is nearly directly copied from the text book.
osg::Geometry* geo = new osg::Geometry;
osg::ref_ptr<osg::Image> img = new osg::Image;
img->allocateImage(w,h, 1, GL_RGBA, GL_FLOAT);
osg::BoundingBox box;
float* data = (float*)img->data();
for (unsigned long int k=0; k<NPoints; k++)
{
*(data++) = cloud->x[k];
*(data++) = cloud->y[k];
*(data++) = cloud->z[k];
*(data++) = cloud->meta[0][k];
box.expandBy(cloud->x[k],cloud->y[k],cloud->z[k]);
}
geo->setUseDisplayList(false);
geo->setUseVertexBufferObjects(true);
geo->setVertexArray( new osg::Vec3Array(1));
geo->addPrimitiveSet( new osg::DrawArrays(GL_POINTS, 0, 1, stop) );
geo->setInitialBound(box);
osg::ref_ptr<osg::Texture2D> tex = new osg::Texture2D;
tex->setImage( img);
tex->setInternalFormat( GL_RGBA32F_ARB );
tex->setFilter( osg::Texture2D::MIN_FILTER, osg::Texture2D::LINEAR);
tex->setFilter( osg::Texture2D::MAG_FILTER, osg::Texture2D::LINEAR);
And here is the shader code.
void main () {
float row;
row = float(gl_InstanceID) / float(width);
vec2 uv = vec2( fract(row), floor(row) / float(height) );
vec4 texValue = texture2D(defaultTex,uv);
vec4 pos = gl_Vertex + vec4(texValue.xyz, 1.0);
gl_Position = gl_ModelViewProjectionMatrix * pos;
}
After a bunch of experimenting, I found that the example code from the OSG Cookbook has some problems.
The scale issue (the first problem) is in the shader.
vec4 pos = gl_Vertex + vec4(texValue.xyz, 1.0);
Should be
vec4 pos = gl_Vertex + vec4(texValue.xyz, 0.0);
This is because the gl_Vertex is a 3-vector with an extra 1 element to aide with matrix transformation. That element should always be 1. The example created another 3+1 vector and added it to gl_Vertex making it a 2. Replace the 1 with a zero and the scale problem goes away.
The blurriness (the second problem) was caused by texture interpolation.
tex->setFilter( osg::Texture2D::MIN_FILTER, osg::Texture2D::LINEAR);
tex->setFilter( osg::Texture2D::MAG_FILTER, osg::Texture2D::LINEAR);
needs to be
tex->setFilter( osg::Texture2D::MIN_FILTER, osg::Texture2D::NEAREST);
tex->setFilter( osg::Texture2D::MAG_FILTER, osg::Texture2D::NEAREST);
so that the interpolator will just take the values from the texture instead of interpolating them from neighboring texture pixels which may be points on the other side of the point cloud. After fixing these two issues, the example works as advertised and seems to be a bit faster in my limited testing.

cocos2d, Splitting an image into serpate R B G channels?

I want to create an effect where after my character gets killed, the red, blue, green color channels of the characters sprite separate into different directions.
something similar to this > http://active.tutsplus.com/tutorials/effects/create-a-retro-crt-distortion-effect-using-rgb-shifting/
How would I go about doing this?
You could just add different offsets when looking up the individual colors in the fragment shader. To make this efficient you should probably render to an intermediate buffer first.
Here is an example of how to do it:
vec4 mainOld( vec2 offset ) {
... (gl_FragCoord.xy + offset) ...
}
void main( void ) {
vec4 foo;
foo.r = mainOld(vec2(-3.0, 0.0)).r;
foo.g = mainOld(vec2(0.0, 5.0)).g;
foo.b = mainOld(vec2(0.0, 0.0)).b;
foo.a = mainOld(vec2(0.0, 0.0)).a;
gl_FragColor = foo;
}
Basically the original shader is now called three times so that's a bit inefficient, which is why I suggested a buffer but that may be premature optimization.
You can look at the result of the above code in an actual shader here:
http://glsl.heroku.com/e#7971.0 (not sure how persistent these links are, sorry)

C++ shader question

I am using Nvidia CG and Direct3D9 and have the question about the following code.
It compiles, but doesn't "loads" (using cgLoadProgram wrapper) and the resulting failure is described simplyas D3D failure happened.
It's a part of the pixel shader compiled with shader model set to 3.0
What may be interesting is that this shader loads fine in the following cases:
1) Manually unrolling the while statement (to many if { } statements).
2) Removing the line with the tex2D function in the loop.
3) Switching to shader model 2_X and manually unrolling the loop.
Problem part of the shader code:
float2 tex = float2(1, 1);
float2 dtex = float2(0.01, 0.01);
float h = 1.0 - tex2D(height_texture1, tex);
float height = 1.00;
while ( h < height )
{
height -= 0.1;
tex += dtex;
// Remove the next line and it works (not as expected,
// of course)
h = tex2D( height_texture1, tex );
}
If someone knows why this can happen or could test the similiar code in non-CG environment or could help me in some other way, I'm waiting for you ;)
Thanks.
I think you need to determine the gradients before the loop using ddx/ddy on the texture coordinates and then use tex2D(sampler2D samp, float2 s, float2 dx, float2 dy)
The GPU always renders quads not pixels (even on pixel borders - superfluous pixels are discarded by the render backend). This is done because it allows it to always calculate the screen space texture derivates even when you use calculated texture coordinates. It just needs to take the difference between the values at the pixel centers.
But this doesn't work when using dynamic branching like in the code in the question, because the shader processors at the individual pixels could diverge in control flow. So you need to calculate the derivates manually via ddx/ddy before the program flow can diverge.