Converting 2D Noise to 3D - opengl

I've recently started experimenting with noise (simple perlin noise), and have run into a slight problem with animating it. So far come I've across an awesome looking 3d noise (https://github.com/ashima/webgl-noise) that I could use in my project but that I understood nothing of, and a bunch of tutorials that explain how to create simple 2d noise.
For the 2d noise, I originally used the following fragment shader:
uniform sampler2D al_tex;
varying vec4 varying_pos; //Actual coords
varying vec2 varying_texcoord; //Normalized coords
uniform float time;
float rand(vec2 co) { return fract(sin(dot(co, vec2(12.9898, 78.233))) * 43758.5453); }
float ease(float p) { return 3*p*p - 2*p*p*p; }
float cnoise(vec2 p, int wavelength)
{
int ix1 = (int(varying_pos.x) / wavelength) * wavelength;
int iy1 = (int(varying_pos.y) / wavelength) * wavelength;
int ix2 = (int(varying_pos.x) / wavelength) * wavelength + wavelength;
int iy2 = (int(varying_pos.y) / wavelength) * wavelength + wavelength;
float x1 = ix1 / 1280.0f;
float y1 = iy1 / 720.0f;
float x2 = ix2 / 1280.0f;
float y2 = iy2 / 720.0f;
float xOffset = (varying_pos.x - ix1) / wavelength;
float yOffset = (varying_pos.y - iy1) / wavelength;
xOffset = ease(xOffset);
yOffset = ease(yOffset);
float t1 = rand(vec2(x1, y1));
float t2 = rand(vec2(x2, y1));
float t3 = rand(vec2(x2, y2));
float t4 = rand(vec2(x1, y2));
float tt1 = mix(t1, t2, xOffset);
float tt2 = mix(t4, t3, xOffset);
return mix(tt1, tt2, yOffset);
}
void main()
{
float t = 0;
int minFreq = 0;
int noIterations = 8;
for (int i = 0; i < noIterations; i++)
t += cnoise(varying_texcoord, int(pow(2, i + minFreq))) / pow(2, noIterations - i);
gl_FragColor = vec4(vec3(t), 1);
}
The result that I got was this:
Now, I want to animate it with time. My first thought was to change the rand function to take a vec3 instead of vec2, and then change my cnoise function accordingly, to interpolate values in the z direction too. With that goal in mind, I made this:
sampler2D al_tex;
varying vec4 varying_pos;
varying vec2 varying_texcoord;
uniform float time;
float rand(vec3 co) { return fract(sin(dot(co, vec3(12.9898, 78.2332, 58.5065))) * 43758.5453); }
float ease(float p) { return 3*p*p - 2*p*p*p; }
float cnoise(vec3 pos, int wavelength)
{
ivec3 iPos1 = (ivec3(pos) / wavelength) * wavelength; //The first value that I'll sample to interpolate
ivec3 iPos2 = iPos1 + wavelength; //The second value
vec3 transPercent = (pos - iPos1) / wavelength; //Transition percent - A float in [0-1) indicating how much of each of the above values will contribute to final result
transPercent.x = ease(transPercent.x);
transPercent.y = ease(transPercent.y);
transPercent.z = ease(transPercent.z);
float t1 = rand(vec3(iPos1.x, iPos1.y, iPos1.z));
float t2 = rand(vec3(iPos2.x, iPos1.y, iPos1.z));
float t3 = rand(vec3(iPos2.x, iPos2.y, iPos1.z));
float t4 = rand(vec3(iPos1.x, iPos2.y, iPos1.z));
float t5 = rand(vec3(iPos1.x, iPos1.y, iPos2.z));
float t6 = rand(vec3(iPos2.x, iPos1.y, iPos2.z));
float t7 = rand(vec3(iPos2.x, iPos2.y, iPos2.z));
float t8 = rand(vec3(iPos1.x, iPos2.y, iPos2.z));
float tt1 = mix(t1, t2, transPercent.x);
float tt2 = mix(t4, t3, transPercent.x);
float tt3 = mix(t5, t6, transPercent.x);
float tt4 = mix(t8, t7, transPercent.x);
float tt5 = mix(tt1, tt2, transPercent.y);
float tt6 = mix(tt3, tt4, transPercent.y);
return mix(tt5, tt6, transPercent.z);
}
float fbm(vec3 p)
{
float t = 0;
int noIterations = 8;
for (int i = 0; i < noIterations; i++)
t += cnoise(p, int(pow(2, i))) / pow(2, noIterations - i);
return t;
}
void main()
{
vec3 p = vec3(varying_pos.xy, time);
float t = fbm(p);
gl_FragColor = vec4(vec3(t), 1);
}
However, on doing this, the animation feels... strange. It's as though I'm watching a slideshow of perlin noise slides, with the individual slides fading in. All other perlin noise examples that I have tried (like https://github.com/ashima/webgl-noise) are actually animated with time - you can actually see it being animated, and don't just feel like the images are fading in, and not being actually animated. I know that I could just use the webgl-noise shader, but I want to make one for myself, and for some reason, I'm failing miserably. Could anyone tell me where I am going wrong, or suggest me on how I can actually animate it properly with time?

You should proably include z in the sin function:
float rand(vec3 co) { return fract(sin(dot(co.xy ,vec2(12.9898,78.233)) + co.z) * 43758.5453); }
Apparently the somewhat random numbers are prime numbers. This is to avoid patterns in the noise. I found another prime number, 94418953, and included that in the sin/dot function. Try this:
float rand(vec3 co) { return fract(sin(dot(co.xyz ,vec3(12.9898,78.233, 9441.8953))) * 43758.5453); }
EDIT: You don't take into account wavelength on the z axis. This means that all your iterations will have the same interpolation distance. In other words, you will get the fade effect you're describing. Try calculating z the same way you calculate x and y:
int iz1 = (int(p.z) / wavelength) * wavelength;
int iz2 = (int(p.z) / wavelength) * wavelength + wavelength;
float z1 = iz1 / 720.0f;
float z2 = iz2 / 720.0f;
float zOffset = (varying_pos.z - iz1) / wavelength;
This means however that the z value will variate the same rate that y will. So if you want it to scale from 0 to 1 then you should proably multiply z with 720 before passing it into the noise function.

check this code. it's a simple version of 3d noise:
// Here are some easy to understand noise gens... the D line in cubic interpolation (rounding)
function rndng ( n: float ): float
{//random proportion -1, 1 ... many people use Sin to take
//linearity out of a pseudo random, exp n*n is faster on central processor.
var e = ( n *321.9234)%1;
return (e*e*111.07546)%2-1;
}
function lerps(o:float, v:float, alpha:float):float
{
o += ( v - o ) * alpha;
return o;
}
//3d ----------------
function lnz ( vtx: Vector3 ): float //3d perlin noise code fast
{
vtx= Vector3 ( Mathf.Abs(vtx.x) , Mathf.Abs(vtx.y) , Mathf.Abs(vtx.z) ) ;
var I = Vector3 (Mathf.Floor(vtx.x),Mathf.Floor(vtx.y),Mathf.Floor(vtx.z));
var D = Vector3(vtx.x%1,vtx.y%1,vtx.z%1);
D = Vector3(D.x*D.x*(3.0-2.0*D.x),D.y*D.y*(3.0-2.0*D.y),D.z*D.z*(3.0-2.0*D.z));
var W = I.x + I.y*71.0 + 125.0*I.z;
return lerps(
lerps( lerps(rndng(W+0.0),rndng(W+1.0),D.x) , lerps(rndng(W+71.0),rndng(W+72.0),D.x) , D.y)
,
lerps( lerps(rndng(W+125.0),rndng(W+126.0),D.x) , lerps(rndng(W+153.0),rndng(W+154.0),D.x) , D.y)
,
D.z
);
}
//1d ----------------
function lnzo ( vtx: Vector3 ): float //perlin noise, same as unityfunction version
{
var total = 0.0;
for (var i:int = 1; i < 5; i ++)
{
total+= lnz2(Vector3 (vtx.x*(i*i),0.0,vtx.z*(i*i)))/(i*i);
}
return total*5;
}
//2d 3 axis honeycombe noise ----------------
function lnzh ( vtx: Vector3 ): float // perlin noise, 2d, with 3 axes at 60'instead of 2 x y axes
{
vtx= Vector3 ( Mathf.Abs(vtx.z) , Mathf.Abs(vtx.z*.5-vtx.x*.866) , Mathf.Abs(vtx.z*.5+vtx.x*.866) ) ;
var I = Vector3 (Mathf.Floor(vtx.x),Mathf.Floor(vtx.y),Mathf.Floor(vtx.z));
var D = Vector3(vtx.x%1,vtx.y%1,vtx.z%1);
//D = Vector3(D.x*D.x*(3.0-2.0*D.x),D.y*D.y*(3.0-2.0*D.y),D.z*D.z*(3.0-2.0*D.z));
var W = I.x + I.y*71.0 + 125.0*I.z;
return lerps(
lerps( lerps(rndng(W+0.0),rndng(W+1.0),D.x) , lerps(rndng(W+71.0),rndng(W+72.0),D.x) , D.y)
,
lerps( lerps(rndng(W+125.0),rndng(W+126.0),D.x) , lerps(rndng(W+153.0),rndng(W+154.0),D.x) , D.y)
,
D.z
);
}
//2d ----------------
function lnz2 ( vtx: Vector3 ): float // i think this is 2d perlin noise
{
vtx= Vector3 ( Mathf.Abs(vtx.x) , Mathf.Abs(vtx.y) , Mathf.Abs(vtx.z) ) ;
var I = Vector3 (Mathf.Floor(vtx.x),Mathf.Floor(vtx.y),Mathf.Floor(vtx.z));
var D = Vector3(vtx.x%1,vtx.y%1,vtx.z%1);
D = Vector3(D.x*D.x*(3.0-2.0*D.x),D.y*D.y*(3.0-2.0*D.y),D.z*D.z*(3.0-2.0*D.z));
var W = I.x + I.y*71.0 + 125.0*I.z;
return lerps(
lerps( lerps(rndng(W+0.0),rndng(W+1.0),D.x) , lerps(rndng(W+71.0),rndng(W+72.0),D.x) , D.z)
,
lerps( rndng(W+125.0), rndng(W+126.0),D.x)
,
D.z
);
}

Related

Finding point inside disk defined by a radius

I have come across a shader that I am trying to understand where a point is found within a disk defined by a radius.
How does this function DiskPoint work?
float HASHVALUE = 0.0f;
vec2 RandomHashValue()
{
return fract(sin(vec2(HASHVALUE += 0.1, HASHVALUE += 0.1)) * vec2(43758.5453123, 22578.1459123));
}
vec2 DiskPoint(in float radius, in float x1, in float x2)
{
float P = radius * sqrt(1.0f - x1);
float theta = x2 * 2.0 * PI;
return vec2(P * cos(theta), P * sin(theta));
}
void main()
{
HASHVALUE = (uvCoords.x * uvCoords.y) * 64.0;
HASHVALUE += fract(time) * 64.0f;
for (int i = 0; i < samples; ++i)
{
vec2 hash = RandomHashValue();
vec2 disk = DiskPoint(sampleRadius, hash.x, hash.y);
//....
}
}
I see it like this:
vec2 DiskPoint(in float radius, in float x1, in float x2)
{
// x1,x2 are "uniform randoms" in range <0,+1>
float P = radius * sqrt(1.0f - x1); // this will scale x1 into P with range <0,radius> but change the distribution to uniform number of points inside disc
float theta = x2 * 2.0 * PI; // this will scale x2 to theta in range <0,6.28>
return vec2(P * cos(theta), P * sin(theta)); // and this just use the above as polar coordinates with parametric circle equation ...
// returning "random" point inside disc with uniform density ...
}
but I just guessed and not tested it so take that in mind
Based on #Spektre´s suggestion I drew the outcome of this function on a Canvas.
the code looks like this:
var canvas = document.getElementById("canvas");
var ctx = canvas.getContext("2d");
var canvasWidth = canvas.width;
var canvasHeight = canvas.height;
class vec2{
constructor(x, y){
this.x = x;
this.y = y;
}
}
function DiskPoint(radius, x1, x2){
var p = radius * Math.sqrt(1.0 - x1);
var theta = x2 * 2.0 * 3.14;
return new vec2(p * Math.cos(theta), p * Math.sin(theta));
}
ctx.fillStyle = "#FF0000";
ctx.fillRect(100, 100, 1, 1);
for(var i = 0; i < 100; i++){
for(var j = 0; j < 100; j++){
var point = DiskPoint(0.5, i/100, j/100);
ctx.fillRect(point.x * 100 + 100, point.y * 100 + 100, 1, 1);
console.log(point.x, point.y);
}
}
This is the created Pattern:
It looks like it finds for every (x1,x2) a point in the circle of given radius.
Hope this will help!

Interpreting visual studio profiler, is this subtraction slow? Can I make all this any faster?

I'm using the Visual Studio profiler for the first time and I'm trying to interpret the results. Looking at the percentages on the left, I found this subtraction's time cost a bit strange:
Other parts of the code contain more complex expressions, like:
Even a simple multiplication seems way faster than the subtraction :
Other multiplications take way longer and I really don't get why, like this :
So, I guess my question is if there is anything weird going on here.
Complex expressions take longer than that subtraction and some expressions take way longer than similar other ones. I run the profiler several times and the distribution of the percentages is always like this. Am I just interpreting this wrong?
Update:
I was asked to give the profile for the whole function so here it is, even though it's a bit big. I ran the function inside a for loop for 1 minute and got 50k samples. The function contains a double loop. I include the text first for ease, followed by the pictures of profiling. Note that the code in text is a bit updated.
for (int i = 0; i < NUMBER_OF_CONTOUR_POINTS; i++) {
vec4 contourPointV(contour3DPoints[i], 1);
float phi = angles[i];
float xW = pose[0][0] * contourPointV.x + pose[1][0] * contourPointV.y + contourPointV.z * pose[2][0] + pose[3][0];
float yW = pose[0][1] * contourPointV.x + pose[1][1] * contourPointV.y + contourPointV.z * pose[2][1] + pose[3][1];
float zW = pose[0][2] * contourPointV.x + pose[1][2] * contourPointV.y + contourPointV.z * pose[2][2] + pose[3][2];
float x = -G_FU_STRICT * xW / zW;
float y = -G_FV_STRICT * yW / zW;
x = (x + 1) * G_WIDTHo2;
y = (y + 1) * G_HEIGHTo2;
y = G_HEIGHT - y;
phi -= extraTheta;
if (phi < 0)phi += CV_PI2;
int indexForTable = phi * oneKoverPI;
//vec2 ray(cos(phi), sin(phi));
vec2 ray(cos_pre[indexForTable], sin_pre[indexForTable]);
vec2 ray2(-ray.x, -ray.y);
float outerStepX = ray.x * step;
float outerStepY = ray.y * step;
cv::Point2f outerPoint(x + outerStepX, y + outerStepY);
cv::Point2f innerPoint(x - outerStepX, y - outerStepY);
cv::Point2f contourPointCV(x, y);
cv::Point2f contourPointCVcopy(x, y);
bool cut = false;
if (!isInView(outerPoint.x, outerPoint.y) || !isInView(innerPoint.x, innerPoint.y)) {
cut = true;
}
bool outside2 = true; bool outside1 = true;
if (cut) {
outside2 = myClipLine(contourPointCV.x, contourPointCV.y, outerPoint.x, outerPoint.y, G_WIDTH - 1, G_HEIGHT - 1);
outside1 = myClipLine(contourPointCVcopy.x, contourPointCVcopy.y, innerPoint.x, innerPoint.y, G_WIDTH - 1, G_HEIGHT - 1);
}
myIterator innerRayMine(contourPointCVcopy, innerPoint);
myIterator outerRayMine(contourPointCV, outerPoint);
if (!outside1) {
innerRayMine.end = true;
innerRayMine.prob = true;
}
if (!outside2) {
outerRayMine.end = true;
innerRayMine.prob = true;
}
vec2 normal = -ray;
float dfdxTerm = -normal.x;
float dfdyTerm = normal.y;
vec3 point3D = vec3(xW, yW, zW);
cv::Point contourPoint((int)x, (int)y);
float Xc = point3D.x; float Xc2 = Xc * Xc; float Yc = point3D.y; float Yc2 = Yc * Yc; float Zc = point3D.z; float Zc2 = Zc * Zc;
float XcYc = Xc * Yc; float dfdxFu = dfdxTerm * G_FU; float dfdyFv = dfdyTerm * G_FU; float overZc2 = 1 / Zc2; float overZc = 1 / Zc;
pixelJacobi[0] = (dfdyFv * (Yc2 + Zc2) + dfdxFu * XcYc) * overZc2;
pixelJacobi[1] = (-dfdxFu * (Xc2 + Zc2) - dfdyFv * XcYc) * overZc2;
pixelJacobi[2] = (-dfdyFv * Xc + dfdxFu * Yc) * overZc;
pixelJacobi[3] = -dfdxFu * overZc;
pixelJacobi[4] = -dfdyFv * overZc;
pixelJacobi[5] = (dfdyFv * Yc + dfdxFu * Xc) * overZc2;
float commonFirstTermsSum = 0;
float commonFirstTermsSquaredSum = 0;
int test = 0;
while (!innerRayMine.end) {
test++;
cv::Point xy = innerRayMine.pos(); innerRayMine++;
int x = xy.x;
int y = xy.y;
float dx = x - contourPoint.x;
float dy = y - contourPoint.y;
vec2 dxdy(dx, dy);
float raw = -glm::dot(dxdy, normal);
float heavisideTerm = heaviside_pre[(int)raw * 100 + 1000];
float deltaTerm = delta_pre[(int)raw * 100 + 1000];
const Vec3b rgb = ante[y * 640 + x];
int red = rgb[0]; int green = rgb[1]; int blue = rgb[2];
red = red >> 3; red = red << 10; green = green >> 3; green = green << 5; blue = blue >> 3;
int colorIndex = red + green + blue;
pF = pFPointer[colorIndex];
pB = pBPointer[colorIndex];
float denAsMul = 1 / (pF + pB + 0.000001);
pF = pF * denAsMul;
float pfMinusPb = 2 * pF - 1;
float denominator = heavisideTerm * (pfMinusPb)+pB + 0.000001;
float commonFirstTerm = -pfMinusPb / denominator * deltaTerm;
commonFirstTermsSum += commonFirstTerm;
commonFirstTermsSquaredSum += commonFirstTerm * commonFirstTerm;
}
}
Visual Studio profiles by sampling: it interrupts execution often and records the value of the instruction pointer; it then maps it to the source and calculates the frequency of hitting that line.
There are few issues with that: it's not always possible to figure out which line produced a specific assembly instruction in the optimized code.
One trick I use is to move the code of interest into a separate function and declare it with __declspec(noinline) .
In your example, are you sure the subtraction was performed as many times as multiplication? I would be more puzzled by the difference in subsequent multiplication (0.39% and 0.53%)
Update:
I believe that the following lines:
float phi = angles[i];
and
phi -= extraTheta;
got moved together in assembly and the time spent getting angles[i] was added to that subtraction line.

openCl path tracer creates strange noise patterns

I've made a path tracer using openCl and c++, following the basic structure in this tutorial: http://raytracey.blogspot.com/2016/11/opencl-path-tracing-tutorial-2-path.html. As far as I can tell, nothing is wrong with the path tracing algorithm itself, but I get strange stripe patterns in the image that don't match the regular noise of path tracing. striped image
There are distinct vertical stripes and more narrow horizontal ones that make the image look granular regardless of how many samples I take per pixel. Again, pixel by pixel, the path tracer seems to be working (the outlines of objects are correct even where they appear mid-stripe) as seen here: close-up.
The only difference between my code and the one in the tutorial I link is that Sam Lapere appears to be using the c++ wrapper for openCl, and I've added a couple of features like movement. There also are a few differences in how I'm handling light bounces.
I'm new to openCl. What could be causing this? It seems like it doesn't have to do with my ray tracer itself, but somehow in the way I'm implementing openCl. I'm also using an SDL texture and renderer to show the image to the screen
here is the tracer code if it helps:
kernel:
__kernel void render_kernel
(__constant struct Sphere* spheres, const int width, const int height,
const int sphere_count, __global int * output, __global float3*
pixel_buckets, __global int* counter, __constant struct Ray* camera,
__global bool* reset){
int gid = get_global_id(0);
//for movement
if (*reset){
pixel_buckets[gid] = (float3)(0,0,0);
counter[gid] = 0;
}
int xcoord = gid % width;
int ycoord = gid / width;
struct Ray camray = createCamRay(xcoord, ycoord, width, height, counter[gid], camera);
float3 final_color = trace(spheres, &camray, sphere_count, xcoord, ycoord);
counter[gid] ++;
//average colors
pixel_buckets[gid] += final_color;
output[gid] = colorInt(clampColor(pixel_buckets[gid] / counter[gid]));
}
trace:
float3 trace(__constant struct Sphere* spheres, struct Ray* camray, const int sphere_count,
unsigned int seed0, unsigned int seed1){
struct Ray ray = *camray;
struct Sphere sphere1;
sphere1.center = (float3)(0, 0, 3);
sphere1.radius = 0.7;
sphere1.color = (float3)(1,1,0);
const int bounce_count = 8;
float3 colors[20];
float3 emiss[20];
for (int bounce = 0; bounce < bounce_count; bounce ++){
int sphere_id = 0;
float hit_distance = intersectScene(spheres, &ray, &sphere_id, sphere_count);
struct Sphere hit_sphere = spheres[sphere_id];
float3 hit_point = ray.origin + (ray.direction * hit_distance);
float3 normal = normalize(hit_point - hit_sphere.center);
if (dot(normal, -ray.direction) < 0){
normal = -normal;
}
//random bounce angles
float rand_theta = get_random(seed0, seed1);
float theta = acos(sqrt(rand_theta));
float rand_phi = get_random(seed0, seed1);
float phi = 2 * PI * rand_phi;
//scales the tnb vectors
float x = sin(theta) * sin(phi);
float y = sin(theta) * cos(phi);
float n = cos(theta);
float3 hemx = normalize(cross(ray.direction, normal)) * x;
float3 hemy = normalize(cross(hemx, normal)) * y;
normal = normal * n;
float3 new_ray = normalize(hemx + hemy + normal);
ray.origin = hit_point + (normal * EPSILON);
ray.direction = new_ray;
colors[bounce] = hit_sphere.color;
emiss[bounce] = hit_sphere.emmissive;
}
colors[bounce_count] = (float3)(0,0,0);
emiss[bounce_count] = (float3)(0,0,0);
for (int i = bounce_count - 1; i >= 0; i--){
colors[i] = (colors[i] * emiss[i]) + (colors[i] * colors[i + 1]);
}
return colors[0];
}
random number generator:
float get_random(unsigned int *seed0, unsigned int *seed1) {
/* hash the seeds using bitwise AND operations and bitshifts */
*seed0 = 36969 * ((*seed0) & 65535) + ((*seed0) >> 16);
*seed1 = 18000 * ((*seed1) & 65535) + ((*seed1) >> 16);
unsigned int ires = ((*seed0) << 16) + (*seed1);
/* use union struct to convert int to float */
union {
float f;
unsigned int ui;
} res;
res.ui = (ires & 0x007fffff) | 0x40000000; /* bitwise AND, bitwise OR */
return (res.f - 2.0f) / 2.0f;
}
thanks

Sphere Collision Algorithm-Overal Velocities Keep increacing (C++, OpenGL)

Im having trouble with my "Box Full of Sphere Particles" project. ive created a spheres[number_of_spheres][position,radius,velocity..] array. A bounding box (sides parallel to orthonormal axis).
Spheres that collide with the box have their velocity reversed on the dimention of the collision. that works ok.
For the sphere-to-sphere collision its pretty simple, i detect the collision by comparing distance to sum of radii. Now im trying to make them simply exchange velocities on the axis of collision. But the simulation runs fine for a while then the speeds keep increasing till its out of control.
bool CollisionDetect(int i, int j)
{
if(i==j)
{
return false;
}
float xx = (spheres[i][0]-spheres[j][0])*(spheres[i][0]-spheres[j][0]);
float yy = (spheres[i][1]-spheres[j][1])*(spheres[i][1]-spheres[j][1]);
float zz = (spheres[i][2]-spheres[j][2])*(spheres[i][2]-spheres[j][2]);
if( (xx + yy + zz) <= (spheres[i][3] + spheres[j][3])*(spheres[i][3] + spheres[j][3]) )
{
return true;
}
else
{
return false;
}
}
void CollisionSolve(int i, int j)
{
float m1,m2,m21;
Vec3 vel1,vel2,pos1,pos2; //spheres info
Vec3 nv1n,nv1b,nv1t;
Vec3 nv2n,nv2b,nv2t;
Vec3 N,T,B,X,Y,Z; //collision plane and world orthonormal basis
X.x=1;
X.y=0;
X.z=0;
Y.x=0;
Y.y=1;
Y.z=0;
Z.x=0;
Z.y=0;
Z.z=1;
pos1.x = spheres[i][0];
pos2.x = spheres[j][0];
pos1.y = spheres[i][1];
pos2.y = spheres[j][1];
pos1.z = spheres[i][2];
pos2.z = spheres[j][2];
vel1.x = spheres[i][4];
vel2.x = spheres[j][4];
vel1.y = spheres[i][5];
vel2.y = spheres[j][5];
vel1.z = spheres[i][6];
vel2.z = spheres[j][6];
m1 = spheres[i][8];
m2 = spheres[j][8]; //mass (for later)
N = minus(pos2,pos1); //get N vector (connecting centers of spheres)
N = Normalize(N);
T=X;
B = crossProduct(N,T); //find first perpendicular axis to N
if (B.x==0 && B.y==0 && B.z==0) //then vector parallel to X axis -
{
T=Y; //try Y axis
B = crossProduct(N,T);
}
T = crossProduct(N,B); //find second perpendicular axis to N
T = Normalize(T);
B = Normalize(B);
if (simplespherecollision)
{
nv1n = projectUonV (vel1 , N);
nv2n = projectUonV (vel2 , N);
vel1 = minus (vel1,nv1n);
vel2 = minus (vel2,nv2n);
vel1 = plus (vel1,nv2n);
vel2 = plus (vel2,nv1n);
//simply switch speed (for test)
}
/*/---THIS IS COMMENTED OUT (FIRST METHOD USED - DIDNT WORK)--------------------------------------------
nv1n = projectUonV (vel1 , N);
nv2n = projectUonV (vel2 , N);
nv1t = projectUonV (vel1 , T);
nv2t = projectUonV (vel2 , T);
nv1b = projectUonV (vel1 , B);
nv2b = projectUonV (vel2 , B); //project velocities on new orthonormal basis
vel1 = plus(nv1t , plus(nv1b , nv2n)); //project velocities back to world basis
vel2 = plus(nv2t , plus( nv2b , nv1n)); //by adding the sub vectors with swiched Xn
/*/----------------------------------------------------------------------
spheres[i][4] = vel1.x;
spheres[i][5] = vel1.y;
spheres[i][6] = vel1.z;
spheres[j][4] = vel2.x;
spheres[j][5] = vel2.y;
spheres[j][6] = vel2.z; //reasign velocities to spheres
}
}
And this is the Vec3 struct (just in case)
struct Vec3
{
float x, y, z;
};
Vec3 crossProduct(const Vec3& v1, const Vec3& v2)
{
Vec3 r;
r.x = (v1.y*v2.z) - (v1.z*v2.y);
r.y = (v1.z*v2.x) - (v1.x*v2.z);
r.z = (v1.x*v2.y) - (v1.y*v2.x);
return r;
}
Vec3 minus(const Vec3& v1, const Vec3& v2)
{
Vec3 r;
r.x = v1.x - v2.x;
r.y = v1.y - v2.y;
r.z = v1.z - v2.z;
return r;
}
Vec3 plus(const Vec3& v1, const Vec3& v2)
{
Vec3 r;
r.x = v1.x + v2.x;
r.y = v1.y + v2.y;
r.z = v1.z + v2.z;
return r;
}
double dotProduct(const Vec3& v1, const Vec3& v2)
{
return v1.x * v2.x + v1.y * v2.y + v1.z * v2.z;
}
Vec3 scale(const Vec3& v, double a)
{
Vec3 r;
r.x = v.x * a;
r.y = v.y * a;
r.z = v.z * a;
return r;
}
Vec3 projectUonV(const Vec3& u, const Vec3& v)
{
Vec3 r;
r = scale(v,dotProduct(u,v));
return r;
}
int distanceSquared(const Vec3& v1, const Vec3& v2)
{
Vec3 delta = minus(v2, v1);
return dotProduct(delta, delta);
}
Vec3 Normalize (const Vec3& v)
{
Vec3 r;
r=v;
int mag = sqrt(dotProduct(v,v));
r.x = v.x/mag;
r.y = v.y/mag;
r.z = v.z/mag;
return r;
}
And this is my Render function witch draws the spheres and the bounding box and calls the other functions
void Render()
{
//CLEARS FRAME BUFFER ie COLOR BUFFER& DEPTH BUFFER (1.0)
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); // Clean up the colour of the window
// and the depth buffer
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glTranslatef(-MAX_X/2,-MAX_Y/2,-MAX_Z*4);
glTranslatef (dx,0,dz);
//------------------------------------------------------------------------------------------------
//-----------------------------BOUNDING BOX----------------------------------------------------
glPushMatrix();
glTranslatef(MAX_X/2,MAX_Y/2,MAX_Z/2);
glColor4f(0.9,0.6,0.8,0.3);
glScalef(MAX_X,MAX_Y,MAX_Z);
glutSolidCube(1);
glutWireCube(1);
glPopMatrix();
//-----------------------------WALL COLLISION DETECTION-----ALWAYS ON--------------------------------
for (int i = 0; i<PART_NUM; i++)
{
if(spheres[i][0] > (MAX_X - spheres[i][3]) && spheres[i][4] > 0)
{
spheres[i][4] = spheres[i][4]*-1;
wallcollcount++;
}
if(spheres[i][1] > (MAX_Y - spheres[i][3]) && spheres[i][5] > 0)
{
spheres[i][5] = spheres[i][5]*-1;
wallcollcount++;
}
if(spheres[i][2] > (MAX_Z - spheres[i][3]) && spheres[i][6] > 0)
{
spheres[i][6] = spheres[i][6]*-1;
wallcollcount++;
}
if(spheres[i][0] < spheres[i][3] && spheres[i][4] < 0)
{
spheres[i][4] = spheres[i][4]*-1;
wallcollcount++;
}
if(spheres[i][1] < spheres[i][3] && spheres[i][5] < 0)
{
spheres[i][5] = spheres[i][5]*-1;
wallcollcount++;
}
if(spheres[i][2] < spheres[i][3] && spheres[i][6] < 0)
{
spheres[i][6] = spheres[i][6]*-1;
wallcollcount++;
}
//--------------------------------------Sphere ColDit -------------------------
if (spherecollision || simplespherecollision)
{
for(int j=i+1; j<PART_NUM; j++)
{
if(CollisionDetect(i,j))
{
spherecollcount++;
CollisionSolve(i,j);
}
}
}
//-----------------------------------------------DRAW--------------------------------
glPushMatrix();
glTranslatef(spheres[i][0],spheres[i][1],spheres[i][2]);
glColor3f( (float) (i+1)/(PART_NUM) , 1-(float)(i+1)/(PART_NUM) , 0.5);
glutSolidSphere(spheres[i][3], 18,18);
glPopMatrix();
}
//---------------------------------------------------------------------------------------------------------------/
glutSwapBuffers(); // All drawing commands applied to the
// hidden buffer, so now, bring forward
// the hidden buffer and hide the visible one
}
There are two things conserved in a collision:
momentum
and
energy
So far your calculations just consider the momentum (a simply speed swap assumes a so called central collision and identical masses of the spheres). If computers were infinitely precise this would work out perfectly. But computers have only limited precisions and so roundoff errors will creep in.
The simple solution for that problem is to correct the momentum part (which is mass · velocity) for the deviation in energy (1/2 mass · velocity²). This works because energy depends by the square of the velocity so small deviations on the speeds involved will create large energy deviations which you can use for correcting the calculation.
I.e. you calculate the total energy before the collision and then after the collision. Then you take the ratio sqrt(E_before / E_after) and scale the speeds after the calculation with that.
If you want to be really accurate you could do relativistic momentum and energy transfer ;)

Second iteration crash - order irrelevant

To save on global memory transfers, and because all of the steps of the code work individually, I have tried to combine all of the kernals into a single kernal, with the first 2 (of 3) steps being done as device calls rather than global calls.
This is failing in the second half of the first step.
There is a function that I need to call twice, to calculate the 2 halves of an image. Regardless of the order the image is calculated in, it crashes on the second iteration.
After examining the code as well as I could, and running it multiple times with different return points, I have found what makes it crash.
__device__
void IntersectCone( float* ModDistance,
float* ModIntensity,
float3 ray,
int threadID,
modParam param )
{
bool ignore = false;
float3 normal = make_float3(0.0f,0.0f,0.0f);
float3 result = make_float3(0.0f,0.0f,0.0f);
float normDist = 0.0f;
float intensity = 0.0f;
float check = abs( Dot(param.position, Cross(param.direction,ray) ) );
if(check > param.r1 && check > param.r2)
ignore = true;
float tran = param.length / (param.r2/param.r1 - 1);
float length = tran + param.length;
float Lsq = length * length;
float cosSqr = Lsq / (Lsq + param.r2 * param.r2);
//Changes the centre position?
float3 position = param.position - tran * param.direction;
float aDd = Dot(param.direction, ray);
float3 e = position * -1.0f;
float aDe = Dot(param.direction, e);
float dDe = Dot(ray, e);
float eDe = Dot(e, e);
float c2 = aDd * aDd - cosSqr;
float c1 = aDd * aDe - cosSqr * dDe;
float c0 = aDe * aDe - cosSqr * eDe;
float discr = c1 * c1 - c0 * c2;
if(discr <= 0.0f)
ignore = true;
if(!ignore)
{
float root = sqrt(discr);
float sign;
if(c1 > 0.0f)
sign = 1.0f;
else
sign = -1.0f;
//Try opposite sign....?
float3 result = (-c1 + sign * root) * ray / c2;
e = result - position;
float dot = Dot(e, param.direction);
float3 s1 = Cross(e, param.direction);
float3 normal = Cross(e, s1);
if( (dot > tran) || (dot < length) )
{
if(Dot(normal,ray) <= 0)
{
normal = Norm(normal); //This stuff (1)
normDist = Magnitude(result);
intensity = -IntensAt1m * Dot(ray, normal) / (normDist * normDist);
}
}
}
ModDistance[threadID] = normDist; and this stuff (2)
ModIntensity[threadID] = intensity;
}
There are two things I can do to to make this not crash, both off which negate the point of the function: If I do not try to write to ModDistance[] and ModIntensity[], or if I do not write to normDist and intensity.
First chance exceptions are thrown by the code above, but not if either of the blocks commented out.
Also, The program only crashes the second time this routine is called.
Have been trying to figure this out all day, any help would be fantastic.
The code that calls it is:
int subrow = threadIdx.y + Mod_Height/2;
int threadID = subrow * (Mod_Width+1) + threadIdx.x;
int obsY = windowY + subrow;
float3 ray = CalculateRay(obsX,obsY);
if( !IntersectSphere(ModDistance, ModIntensity, ray, threadID, param) )
{
IntersectCone(ModDistance, ModIntensity, ray, threadID, param);
}
subrow = threadIdx.y;
threadID = subrow * (Mod_Width+1) + threadIdx.x;
obsY = windowY + subrow;
ray = CalculateRay(obsX,obsY);
if( !IntersectSphere(ModDistance, ModIntensity, ray, threadID, param) )
{
IntersectCone(ModDistance, ModIntensity, ray, threadID, param);
}
The kernel is running out of resources. As posted in the comments, it was giving the error CudaErrorLaunchOutOfResources.
To avoid this, you should use a __launch_bounds__ specifier to specify the block dimensions you want for your kernel. This will force the compiler to ensure there are enough resources. See the CUDA programming guide for details on __launch_bounds__.