random velocities when using simulation shader - glsl

I'm trying to implement a Fruchterman Reingold simulation using shaders. Before implementing the compute portion in a shader I wrote it in javascript. It works exactly as I expect, as seen here:
http://jaredmcqueen.github.io/gpgpu-force-direction/canvas_app.html
When implementing the compute portion in a shader, I get a stable structure that randomly drifts around the screen. I cannot figure out what repulsion / attraction forces are causing my graphs to float around so unpredictably:
http://jaredmcqueen.github.io/gpgpu-force-direction/gpgpu_app.html
the core of the physics are from the repulsion / attraction functions:
//fr(x) = (k*k)/x;
vec3 addRepulsion(vec3 self, vec3 neighbor){
vec3 diff = self - neighbor;
float x = length( diff );
float f = ( k * k ) / x;
return normalize(diff) * f;
}
//fa(x) = (x*x)/k;
vec3 addAttraction(vec3 self, vec3 neighbor){
vec3 diff = self - neighbor;
float x = length( diff );
float f = ( x * x ) / k;
return normalize(diff) * f;
}
Any insight as to why gpgpu, simulation based shaders would behave seemingly random would be greatly appreciated.

It doesn't seem random, construction stabilizes in seemingly right state and moves out in constant direction.
It looks like you apply force in shader and then update your model's position on CPU side and this global model position should stay constant either should have been updated by another value.
From what i've seen in code i recommend to eliminate floating point comprasions (compareNodePosition.w == -1.0 || 0.0) and continue operator. Please say if it helped. I haven't looked into algorithms logic yet.

It turns out I was iterating through the edges incorrectly. Here's my new edge iteration:
float idx = selfEdgeIndices.x;
float idy = selfEdgeIndices.y;
float idz = selfEdgeIndices.z;
float idw = selfEdgeIndices.w;
float start = idx * 4.0 + idy;
float end = idz * 4.0 + idw;
if(! ( idx == idz && idy == idw ) ){
float edgeIndex = 0.0;
for(float y = 0.0; y < edgesTexWidth; y++){
for(float x = 0.0; x < edgesTexWidth; x++){
vec2 ref = vec2( x + 0.5 , y + 0.5 ) / vec2(edgesTexWidth,edgesTexWidth);
vec4 pixel = texture2D(edgeData,ref);
if (edgeIndex >= start && edgeIndex < end){
nodePosition = getNeighbor(pixel.x);
nodeDiff.xyz -= addAttraction(currentNodePosition.xyz, nodePosition);
}
edgeIndex++;
if (edgeIndex >= start && edgeIndex < end){
nodePosition = getNeighbor(pixel.y);
nodeDiff.xyz -= addAttraction(currentNodePosition.xyz, nodePosition);
}
edgeIndex++;
if (edgeIndex >= start && edgeIndex < end){
nodePosition = getNeighbor(pixel.z);
nodeDiff.xyz -= addAttraction(currentNodePosition.xyz, nodePosition);
}
edgeIndex++;
if (edgeIndex >= start && edgeIndex < end){
nodePosition = getNeighbor(pixel.w);
nodeDiff.xyz -= addAttraction(currentNodePosition.xyz, nodePosition);
}
edgeIndex++;
}
}
}

Related

Checking if vector passes through vertices

I have been struggling with this problem for over a month, so I really need your help.
To further elaborate on the question :
The question is whether a vector called 'direction' that starts at a vertex called 'start' passes through the 'taget'.
You need to confirm the direction and distance.
I decided that using the dot product was impossible because I went through enough debugging.
The result is good when calculated directly, but why is the result different when executed in the shader?
The same thickness should be printed depending on the distance, but why does the thin line appear when the distance is far?
Do you have any good ideas even if it's not the way I use the rotation matrix?
These are three questions.
First of all, my situation is
drawing fSQ.
I want to check whether the direction of start passes through the target.
Compute in the pixel shader.
1 is one pixel
The screen size is 1920*1080
bool intersect(float2 target, float2 direction, float2 start) {
bool intersecting = false;
static const float thresholdX = 0.5 / SCREENWIDTH;
static const float thresholdY = 0.5 / SCREENHEIGHT;
if (direction.x == 0 && direction.y == 0);
else {
float2 startToTarget = target - start;
float changedTargetPositionX = startToTarget.x * direction.x + startToTarget.y * direction.y;
float changedTargetPositionY = startToTarget.x * (-direction.y) + startToTarget.y * direction.x;
float rangeOfX = (direction.x * direction.x) + (direction.y * direction.y);
if (changedTargetPositionX <= rangeOfX + thresholdX && changedTargetPositionX >= -thresholdX &&
changedTargetPositionY <= thresholdY && changedTargetPositionY >= -thresholdY) {
intersecting = true;
}
}
return intersecting;
We use a rotation matrix to rotate a vector and then check the difference between the two vectors, which works in most cases, but fails for very small pixels.
For example
start = (15,0) direction= (10,0) taget = (10,0)
In this case, the intersect function should return false, but it returns true.
But if the pixel difference is bigger then it works fine.
and
#define MAX = 5;
float2 points[MAX*MAX];
for (float fi = 1; fi < MAX; fi++)
for (float fj = 1; fj < MAX; fj++)
points[(int)(fi * MAX + fj)] = float2(fi / MAX , fj / MAX);
for(uint ni=0; ni < MAX*MAX;ni++)
for(uint nj=3; nj < MAX*MAX; nj++)
if (intersect(uv, points[nj]- points[ni], points[ni])) {
color = float4(1, 0, 0, 1);
return color;
}
return float4(0, 0, 0, 1);
When debugging like this, the line becomes thinner depending on the distance.
All the lines should have the same thickness, but I don't know why.
This is the result of running the debugging code:
We look forward to your reply.
thank you

Adding Ozone to my sky simulation

I implemented a simulation for the colour of the sky a while ago by following the scratch a pixel tutorial: https://www.scratchapixel.com/lessons/procedural-generation-virtual-worlds/simulating-sky
I adapted it for the actual sun position and am able to get realistic sky colours during the day. However, I noticed that after sunset/ before sunrise, the colours are greyish when they should be deep blue. After researching about this, I read that this is due to the ozone absorption not being present in my model.
I used extinction coefficients : (3.426,8.298,0.356) * 0.06e-5 -> found on https://media.contentapi.ea.com/content/dam/eacom/frostbite/files/s2016-pbs-frostbite-sky-clouds-new.pdf
and also read that since the ozone does not scatter, it should only be added to the transmittance value.
Equation
Therefore, I modified the code from scratchapixel as follows:
for (uint32_t i = 0; i < numSamples; ++i) {
vec3 samplePosition = ray_in2.origin() + (tCurrent +
segmentLength * 0.5f) * ray_in2.direction();
float height = samplePosition.length() - atmosphere.earthRadius;
// compute optical depth for light
float hr = exp(-height / atmosphere.Hr) * segmentLength;
float hm = exp(-height / atmosphere.Hm) * segmentLength;
float ho = exp(-height / atmosphere.Hr)* segmentLength*(6e-7);
opticalDepthR += hr;
opticalDepthM += hm;
opticalDepthO += ho;
// light optical depth
float t0Light, t1Light;
...
for (j = 0; j < numSamplesLight; ++j) {
vec3 samplePositionLight = samplePosition + (tCurrentLight +
segmentLengthLight * 0.5f) * sunDir;
float heightLight = samplePositionLight.length() -
atmosphere.earthRadius;
if (heightLight < 0) break;
opticalDepthLightR += exp(-heightLight / atmosphere.Hr) *
segmentLengthLight;
opticalDepthLightM += exp(-heightLight / atmosphere.Hm) *
segmentLengthLight;
opticalDepthLightO += exp(-heightLight / atmosphere.Hr) *
segmentLengthLight*(6e-7); ;
tCurrentLight += segmentLengthLight;
}
if (j == numSamplesLight) {
vec3 tau = (betaR) * (opticalDepthR + opticalDepthLightR) +
betaM * 1.1f * (opticalDepthM + opticalDepthLightM)+ betaO*
(opticalDepthO + opticalDepthLightO);
vec3 attenuation(exp(-tau.x()), exp(-tau.y()), exp(-
tau.z()));
Summary:
I added variable opticalDepthO and opticalDepthLightO which
are calculated same as the optical depth for Rayleigh, but multiplied
by 6e-7.
Then, the sum of opticalDepthLightO and opticalDepthO is multiplied by the extiction coefficient for ozone and added to variable tau.
Problem is, I see no difference in my sky colour before and after
adding ozone. Can someone guide me to what it is that I'm doing wrong?

glsl texture access synchronization, openCL vs glsl image processing

This might be a trivial question.
I am curious about how glsl would synchronize when accessing texture data via a fragment shader.
Say I have a code like below in a fragment shader.
void main() {
vec3 texCoord = in_texCoord;
vec4 out_voxel_intensity = texture(image, vec3(texCoord.x , texCoord.y, texCoord.z));
out_voxel = float(out_voxel_intensity) ;
if(out_voxel <= threshold)
{
out_voxel = 0.0;
return;
}
for(int i = -int(kernalSize); i <= int(kernalSize);++i)
for(int j = -int(kernalSize); j <= int(kernalSize); ++j)
for(int k = -int(kernalSize); k <= int(kernalSize); ++k)
{
float x_o = texCoord.x + i / (imageSize.x);
float y_o = texCoord.y + j / (imageSize.y);
float z_o = texCoord.z + k / (imageSize.z);
if(x_o < 0.0 || x_o > 1.0
|| y_o < 0. || y_o > 1.0
|| z_o < 0. || z_o > 1.0)
continue;
if(float(texture(image, vec3(x_o, y_o, z_o))) <= threshold)
{
out_voxel = 0.0;
return;
}
}
}
as the code above access not only the current texture coordinate, but the values around it with the specified kernel size, how glsl takes care that no other parallel process access the same texture coordinates.
W.r.t that question, does the code above performs efficiently in a fragment shader given it access neighboring texture data or using openCL better?
Thanks

Converting 2D Noise to 3D

I've recently started experimenting with noise (simple perlin noise), and have run into a slight problem with animating it. So far come I've across an awesome looking 3d noise (https://github.com/ashima/webgl-noise) that I could use in my project but that I understood nothing of, and a bunch of tutorials that explain how to create simple 2d noise.
For the 2d noise, I originally used the following fragment shader:
uniform sampler2D al_tex;
varying vec4 varying_pos; //Actual coords
varying vec2 varying_texcoord; //Normalized coords
uniform float time;
float rand(vec2 co) { return fract(sin(dot(co, vec2(12.9898, 78.233))) * 43758.5453); }
float ease(float p) { return 3*p*p - 2*p*p*p; }
float cnoise(vec2 p, int wavelength)
{
int ix1 = (int(varying_pos.x) / wavelength) * wavelength;
int iy1 = (int(varying_pos.y) / wavelength) * wavelength;
int ix2 = (int(varying_pos.x) / wavelength) * wavelength + wavelength;
int iy2 = (int(varying_pos.y) / wavelength) * wavelength + wavelength;
float x1 = ix1 / 1280.0f;
float y1 = iy1 / 720.0f;
float x2 = ix2 / 1280.0f;
float y2 = iy2 / 720.0f;
float xOffset = (varying_pos.x - ix1) / wavelength;
float yOffset = (varying_pos.y - iy1) / wavelength;
xOffset = ease(xOffset);
yOffset = ease(yOffset);
float t1 = rand(vec2(x1, y1));
float t2 = rand(vec2(x2, y1));
float t3 = rand(vec2(x2, y2));
float t4 = rand(vec2(x1, y2));
float tt1 = mix(t1, t2, xOffset);
float tt2 = mix(t4, t3, xOffset);
return mix(tt1, tt2, yOffset);
}
void main()
{
float t = 0;
int minFreq = 0;
int noIterations = 8;
for (int i = 0; i < noIterations; i++)
t += cnoise(varying_texcoord, int(pow(2, i + minFreq))) / pow(2, noIterations - i);
gl_FragColor = vec4(vec3(t), 1);
}
The result that I got was this:
Now, I want to animate it with time. My first thought was to change the rand function to take a vec3 instead of vec2, and then change my cnoise function accordingly, to interpolate values in the z direction too. With that goal in mind, I made this:
sampler2D al_tex;
varying vec4 varying_pos;
varying vec2 varying_texcoord;
uniform float time;
float rand(vec3 co) { return fract(sin(dot(co, vec3(12.9898, 78.2332, 58.5065))) * 43758.5453); }
float ease(float p) { return 3*p*p - 2*p*p*p; }
float cnoise(vec3 pos, int wavelength)
{
ivec3 iPos1 = (ivec3(pos) / wavelength) * wavelength; //The first value that I'll sample to interpolate
ivec3 iPos2 = iPos1 + wavelength; //The second value
vec3 transPercent = (pos - iPos1) / wavelength; //Transition percent - A float in [0-1) indicating how much of each of the above values will contribute to final result
transPercent.x = ease(transPercent.x);
transPercent.y = ease(transPercent.y);
transPercent.z = ease(transPercent.z);
float t1 = rand(vec3(iPos1.x, iPos1.y, iPos1.z));
float t2 = rand(vec3(iPos2.x, iPos1.y, iPos1.z));
float t3 = rand(vec3(iPos2.x, iPos2.y, iPos1.z));
float t4 = rand(vec3(iPos1.x, iPos2.y, iPos1.z));
float t5 = rand(vec3(iPos1.x, iPos1.y, iPos2.z));
float t6 = rand(vec3(iPos2.x, iPos1.y, iPos2.z));
float t7 = rand(vec3(iPos2.x, iPos2.y, iPos2.z));
float t8 = rand(vec3(iPos1.x, iPos2.y, iPos2.z));
float tt1 = mix(t1, t2, transPercent.x);
float tt2 = mix(t4, t3, transPercent.x);
float tt3 = mix(t5, t6, transPercent.x);
float tt4 = mix(t8, t7, transPercent.x);
float tt5 = mix(tt1, tt2, transPercent.y);
float tt6 = mix(tt3, tt4, transPercent.y);
return mix(tt5, tt6, transPercent.z);
}
float fbm(vec3 p)
{
float t = 0;
int noIterations = 8;
for (int i = 0; i < noIterations; i++)
t += cnoise(p, int(pow(2, i))) / pow(2, noIterations - i);
return t;
}
void main()
{
vec3 p = vec3(varying_pos.xy, time);
float t = fbm(p);
gl_FragColor = vec4(vec3(t), 1);
}
However, on doing this, the animation feels... strange. It's as though I'm watching a slideshow of perlin noise slides, with the individual slides fading in. All other perlin noise examples that I have tried (like https://github.com/ashima/webgl-noise) are actually animated with time - you can actually see it being animated, and don't just feel like the images are fading in, and not being actually animated. I know that I could just use the webgl-noise shader, but I want to make one for myself, and for some reason, I'm failing miserably. Could anyone tell me where I am going wrong, or suggest me on how I can actually animate it properly with time?
You should proably include z in the sin function:
float rand(vec3 co) { return fract(sin(dot(co.xy ,vec2(12.9898,78.233)) + co.z) * 43758.5453); }
Apparently the somewhat random numbers are prime numbers. This is to avoid patterns in the noise. I found another prime number, 94418953, and included that in the sin/dot function. Try this:
float rand(vec3 co) { return fract(sin(dot(co.xyz ,vec3(12.9898,78.233, 9441.8953))) * 43758.5453); }
EDIT: You don't take into account wavelength on the z axis. This means that all your iterations will have the same interpolation distance. In other words, you will get the fade effect you're describing. Try calculating z the same way you calculate x and y:
int iz1 = (int(p.z) / wavelength) * wavelength;
int iz2 = (int(p.z) / wavelength) * wavelength + wavelength;
float z1 = iz1 / 720.0f;
float z2 = iz2 / 720.0f;
float zOffset = (varying_pos.z - iz1) / wavelength;
This means however that the z value will variate the same rate that y will. So if you want it to scale from 0 to 1 then you should proably multiply z with 720 before passing it into the noise function.
check this code. it's a simple version of 3d noise:
// Here are some easy to understand noise gens... the D line in cubic interpolation (rounding)
function rndng ( n: float ): float
{//random proportion -1, 1 ... many people use Sin to take
//linearity out of a pseudo random, exp n*n is faster on central processor.
var e = ( n *321.9234)%1;
return (e*e*111.07546)%2-1;
}
function lerps(o:float, v:float, alpha:float):float
{
o += ( v - o ) * alpha;
return o;
}
//3d ----------------
function lnz ( vtx: Vector3 ): float //3d perlin noise code fast
{
vtx= Vector3 ( Mathf.Abs(vtx.x) , Mathf.Abs(vtx.y) , Mathf.Abs(vtx.z) ) ;
var I = Vector3 (Mathf.Floor(vtx.x),Mathf.Floor(vtx.y),Mathf.Floor(vtx.z));
var D = Vector3(vtx.x%1,vtx.y%1,vtx.z%1);
D = Vector3(D.x*D.x*(3.0-2.0*D.x),D.y*D.y*(3.0-2.0*D.y),D.z*D.z*(3.0-2.0*D.z));
var W = I.x + I.y*71.0 + 125.0*I.z;
return lerps(
lerps( lerps(rndng(W+0.0),rndng(W+1.0),D.x) , lerps(rndng(W+71.0),rndng(W+72.0),D.x) , D.y)
,
lerps( lerps(rndng(W+125.0),rndng(W+126.0),D.x) , lerps(rndng(W+153.0),rndng(W+154.0),D.x) , D.y)
,
D.z
);
}
//1d ----------------
function lnzo ( vtx: Vector3 ): float //perlin noise, same as unityfunction version
{
var total = 0.0;
for (var i:int = 1; i < 5; i ++)
{
total+= lnz2(Vector3 (vtx.x*(i*i),0.0,vtx.z*(i*i)))/(i*i);
}
return total*5;
}
//2d 3 axis honeycombe noise ----------------
function lnzh ( vtx: Vector3 ): float // perlin noise, 2d, with 3 axes at 60'instead of 2 x y axes
{
vtx= Vector3 ( Mathf.Abs(vtx.z) , Mathf.Abs(vtx.z*.5-vtx.x*.866) , Mathf.Abs(vtx.z*.5+vtx.x*.866) ) ;
var I = Vector3 (Mathf.Floor(vtx.x),Mathf.Floor(vtx.y),Mathf.Floor(vtx.z));
var D = Vector3(vtx.x%1,vtx.y%1,vtx.z%1);
//D = Vector3(D.x*D.x*(3.0-2.0*D.x),D.y*D.y*(3.0-2.0*D.y),D.z*D.z*(3.0-2.0*D.z));
var W = I.x + I.y*71.0 + 125.0*I.z;
return lerps(
lerps( lerps(rndng(W+0.0),rndng(W+1.0),D.x) , lerps(rndng(W+71.0),rndng(W+72.0),D.x) , D.y)
,
lerps( lerps(rndng(W+125.0),rndng(W+126.0),D.x) , lerps(rndng(W+153.0),rndng(W+154.0),D.x) , D.y)
,
D.z
);
}
//2d ----------------
function lnz2 ( vtx: Vector3 ): float // i think this is 2d perlin noise
{
vtx= Vector3 ( Mathf.Abs(vtx.x) , Mathf.Abs(vtx.y) , Mathf.Abs(vtx.z) ) ;
var I = Vector3 (Mathf.Floor(vtx.x),Mathf.Floor(vtx.y),Mathf.Floor(vtx.z));
var D = Vector3(vtx.x%1,vtx.y%1,vtx.z%1);
D = Vector3(D.x*D.x*(3.0-2.0*D.x),D.y*D.y*(3.0-2.0*D.y),D.z*D.z*(3.0-2.0*D.z));
var W = I.x + I.y*71.0 + 125.0*I.z;
return lerps(
lerps( lerps(rndng(W+0.0),rndng(W+1.0),D.x) , lerps(rndng(W+71.0),rndng(W+72.0),D.x) , D.z)
,
lerps( rndng(W+125.0), rndng(W+126.0),D.x)
,
D.z
);
}

Second iteration crash - order irrelevant

To save on global memory transfers, and because all of the steps of the code work individually, I have tried to combine all of the kernals into a single kernal, with the first 2 (of 3) steps being done as device calls rather than global calls.
This is failing in the second half of the first step.
There is a function that I need to call twice, to calculate the 2 halves of an image. Regardless of the order the image is calculated in, it crashes on the second iteration.
After examining the code as well as I could, and running it multiple times with different return points, I have found what makes it crash.
__device__
void IntersectCone( float* ModDistance,
float* ModIntensity,
float3 ray,
int threadID,
modParam param )
{
bool ignore = false;
float3 normal = make_float3(0.0f,0.0f,0.0f);
float3 result = make_float3(0.0f,0.0f,0.0f);
float normDist = 0.0f;
float intensity = 0.0f;
float check = abs( Dot(param.position, Cross(param.direction,ray) ) );
if(check > param.r1 && check > param.r2)
ignore = true;
float tran = param.length / (param.r2/param.r1 - 1);
float length = tran + param.length;
float Lsq = length * length;
float cosSqr = Lsq / (Lsq + param.r2 * param.r2);
//Changes the centre position?
float3 position = param.position - tran * param.direction;
float aDd = Dot(param.direction, ray);
float3 e = position * -1.0f;
float aDe = Dot(param.direction, e);
float dDe = Dot(ray, e);
float eDe = Dot(e, e);
float c2 = aDd * aDd - cosSqr;
float c1 = aDd * aDe - cosSqr * dDe;
float c0 = aDe * aDe - cosSqr * eDe;
float discr = c1 * c1 - c0 * c2;
if(discr <= 0.0f)
ignore = true;
if(!ignore)
{
float root = sqrt(discr);
float sign;
if(c1 > 0.0f)
sign = 1.0f;
else
sign = -1.0f;
//Try opposite sign....?
float3 result = (-c1 + sign * root) * ray / c2;
e = result - position;
float dot = Dot(e, param.direction);
float3 s1 = Cross(e, param.direction);
float3 normal = Cross(e, s1);
if( (dot > tran) || (dot < length) )
{
if(Dot(normal,ray) <= 0)
{
normal = Norm(normal); //This stuff (1)
normDist = Magnitude(result);
intensity = -IntensAt1m * Dot(ray, normal) / (normDist * normDist);
}
}
}
ModDistance[threadID] = normDist; and this stuff (2)
ModIntensity[threadID] = intensity;
}
There are two things I can do to to make this not crash, both off which negate the point of the function: If I do not try to write to ModDistance[] and ModIntensity[], or if I do not write to normDist and intensity.
First chance exceptions are thrown by the code above, but not if either of the blocks commented out.
Also, The program only crashes the second time this routine is called.
Have been trying to figure this out all day, any help would be fantastic.
The code that calls it is:
int subrow = threadIdx.y + Mod_Height/2;
int threadID = subrow * (Mod_Width+1) + threadIdx.x;
int obsY = windowY + subrow;
float3 ray = CalculateRay(obsX,obsY);
if( !IntersectSphere(ModDistance, ModIntensity, ray, threadID, param) )
{
IntersectCone(ModDistance, ModIntensity, ray, threadID, param);
}
subrow = threadIdx.y;
threadID = subrow * (Mod_Width+1) + threadIdx.x;
obsY = windowY + subrow;
ray = CalculateRay(obsX,obsY);
if( !IntersectSphere(ModDistance, ModIntensity, ray, threadID, param) )
{
IntersectCone(ModDistance, ModIntensity, ray, threadID, param);
}
The kernel is running out of resources. As posted in the comments, it was giving the error CudaErrorLaunchOutOfResources.
To avoid this, you should use a __launch_bounds__ specifier to specify the block dimensions you want for your kernel. This will force the compiler to ensure there are enough resources. See the CUDA programming guide for details on __launch_bounds__.