Bilinear Image Sampling Non-Reproducible Access Violation - c++

I have a template 2D image buffer class that can be used with many values types. The values are stored as a 1D dynamic array of T, accessed by a Row method to get a pointer to the correct row.
One of the methods of the class is used to sample a value in the image bilinearly.
The code generally works, but ever so rarely I get an access violation exception in this method in production which I can't seem to recreate, because the crash dump doesn't include the coordinates that were passed to the method.
These are the relevant parts of the code:
T* data;
int width, height;
T* Row(int y) const { return data + width * y; }
T GetValueBilinear(float x, float y) const
{
const float PIXEL_CENTER_OFFSET = 0.5F;
const float cx = clamp(0.0F, width - 1.0F, x - PIXEL_CENTER_OFFSET);
const float cy = clamp(0.0F, height - 1.0F, y - PIXEL_CENTER_OFFSET);
const float tx = fmod(cx, 1.0F);
const float ty = fmod(cy, 1.0F);
const int xInt = (int)cx;
const int yInt = (int)cy;
const T* r0 = Row(yInt);
const T* r1 = ty && yInt < (height - 1) ? Row(yInt + 1) : r0;
//interpolate on Y
const T& c00 = r0[xInt];
const T& c01 = r1[xInt];
T c0 = lerp(c00, c01, ty);
if (tx && xInt < (width - 1))
{
//interpolate on X
const T& c10 = r0[xInt + 1];
const T& c11 = r1[xInt + 1];
T c1 = lerp(c10, c11, ty);
return lerp(c0, c1, tx);
}
else
{
return c0;
}
}
The definitions for clamp, and lerp are:
template <typename T>
inline T clamp(T min, T max, T value) { return value < min ? min : value > max ? max : value; }
template <typename T>
inline T lerp(T a, T b, float t) { return a + (b - a) * t; } //i.e. a(1-t)+bt
Do you see any obvious errors which would cause an access violation for any values of x and y which are not NaN?
You can assume that width, height and data are valid and correct (i.e., positive dimensions - in this particular case 1280x720, data is not dangling pointer).
If it matters, then T is a float in this case.
The fact that this is non-reproducible and generally working 99.9% of the time, makes me feel like it could be an accuracy issue, though I can't see where it would come from.
Alternatively, what debugging techniques could I use to analyze the crash dumps more effectively?

I tested your GetValueBilinear with 1073741824 random values for the pair (x,y) on a 1280x720 data with no access violation.. so I would say it is working fine 99.999999%1 of the time :-) I suspect the problem is not in GetValueBilinear but elsewhere...
#include <cmath>
#include <algorithm>
template <typename T>
inline T clamp(T min, T max, T value) { return value < min ? min : value > max ? max : value; }
template <typename T>
inline T lerp(T a, T b, float t) { return a + (b - a) * t; } //i.e. a(1-t)+bt
template < typename T >
class C
{
public:
C(int w, int h) : height(h), width(w) {
float lower_bound = T(0);
float upper_bound = std::nextafter(T(255), std::numeric_limits<T>::max());
std::uniform_real_distribution<float> unif(lower_bound, upper_bound);
std::default_random_engine re;
data = new T[width*height];// I know... a leak! But... who cares?!
std::generate(data, data + (width*height), [&]() {return unif(re); });
}
T GetValueBilinear(float x, float y) const
{
const float PIXEL_CENTER_OFFSET = 0.5F;
const float cx = clamp(0.0F, width - 1.0F, x - PIXEL_CENTER_OFFSET);
const float cy = clamp(0.0F, height - 1.0F, y - PIXEL_CENTER_OFFSET);
const float tx = fmod(cx, 1.0F);
const float ty = fmod(cy, 1.0F);
const int xInt = (int)cx;
const int yInt = (int)cy;
const T* r0 = Row(yInt);
const T* r1 = ty && yInt < (height - 1) ? Row(yInt + 1) : r0;
//interpolate on Y
const T& c00 = r0[xInt];
const T& c01 = r1[xInt];
T c0 = lerp(c00, c01, ty);
if (tx && xInt < (width - 1))
{
//interpolate on X
const T& c10 = r0[xInt + 1];
const T& c11 = r1[xInt + 1];
T c1 = lerp(c10, c11, ty);
return lerp(c0, c1, tx);
}
else
{
return c0;
}
}
T* data;
int width, height;
T* Row(int y) const { return data + width * y; }
};
#include <random>
#include <iostream>
#include <Windows.h>
float x;
float y;
LONG WINAPI my_filter(_In_ struct _EXCEPTION_POINTERS *ExceptionInfo)
{
std::cout << x << " " << y << "\n";
return EXCEPTION_EXECUTE_HANDLER;
}
int main()
{
auto a = ::SetUnhandledExceptionFilter(my_filter);
float lower_bound = -(1 << 20);
float upper_bound = -lower_bound;
std::uniform_real_distribution<float> unif(lower_bound, upper_bound);
std::default_random_engine re;
float acc = 0;
C<float> img(1280, 720);
img.GetValueBilinear(1.863726958e-043, 1.5612089e-038);
for (size_t i = 0; i < (1 << 30); i++) {
x = unif(re);
y = unif(re);
acc += img.GetValueBilinear(x, y);
}
return static_cast<int>(acc);
}
1Even if no access violation was found I cannot say that the algorithm works well 100%, using a naïve model and this R code:
prop.test(0,1073741824)
I get a confidence interval for the true value of the proportion, the interval is (0.000000e+00, 4.460345e-09) and so the success percentage is (1-4.460345e-09)*100, but... do not trust me, I am not a statistician!

Related

2D Poisson-disk sampling in a specific square (not a unit square) with specific minimum distance

Is there any way I can modify the poisson-disk points generator finding here.I need to generate new poisson points using the coordinates of points in the textfile.txt to improve the distribution. below the c++ code of poisson-disk sampling in a unit square.
poissonGenerator.h:
#include <vector>
#include <random>
#include <stdint.h>
#include <time.h>
namespace PoissoGenerator
{
class DefaultPRNG
{
public:
DefaultPRNG()
: m_Gen(std::random_device()())
, m_Dis(0.0f, 1.f)
{
// prepare PRNG
m_Gen.seed(time(nullptr));
}
explicit DefaultPRNG(unsigned short seed)
: m_Gen(seed)
, m_Dis(0.0f, 1.f)
{
}
double RandomDouble()
{
return static_cast <double>(m_Dis(m_Gen));
}
int RandomInt(int Max)
{
std::uniform_int_distribution<> DisInt(0, Max);
return DisInt(m_Gen);
}
private:
std::mt19937 m_Gen;
std::uniform_real_distribution<double> m_Dis;
};
struct sPoint
{
sPoint()
: x(0)
, y(0)
, m_valid(false){}
sPoint(double X, double Y)
: x(X)
, y(Y)
, m_valid(true){}
double x;
double y;
bool m_valid;
//
bool IsInRectangle() const
{
return x >= 0 && y >= 0 && x <= 1 && y <= 1;
}
//
bool IsInCircle() const
{
double fx = x - 0.5f;
double fy = y - 0.5f;
return (fx*fx + fy*fy) <= 0.25f;
}
};
struct sGridPoint
{
sGridPoint(int X, int Y)
: x(X)
, y(Y)
{}
int x;
int y;
};
double GetDistance(const sPoint& P1, const sPoint& P2)
{
return sqrt((P1.x - P2.x)*(P1.x - P2.x) + (P1.y - P2.y)*(P1.y - P2.y));
}
sGridPoint ImageToGrid(const sPoint& P, double CellSize)
{
return sGridPoint((int)(P.x / CellSize), (int)(P.y / CellSize));
}
struct sGrid
{
sGrid(int W, int H, double CellSize)
: m_W(W)
, m_H(H)
, m_CellSize(CellSize)
{
m_Grid.resize((m_H));
for (auto i = m_Grid.begin(); i != m_Grid.end(); i++){ i->resize(m_W); }
}
void Insert(const sPoint& P)
{
sGridPoint G = ImageToGrid(P, m_CellSize);
m_Grid[G.x][G.y] = P;
}
bool IsInNeighbourhood(sPoint Point, double MinDist, double CellSize)
{
sGridPoint G = ImageToGrid(Point, CellSize);
//number of adjacent cell to look for neighbour points
const int D = 5;
// Scan the neighbourhood of the Point in the grid
for (int i = G.x - D; i < G.x + D; i++)
{
for (int j = G.y - D; j < G.y + D; j++)
{
if (i >= 0 && i < m_W && j >= 0 && j < m_H)
{
sPoint P = m_Grid[i][j];
if (P.m_valid && GetDistance(P, Point) < MinDist){ return true; }
}
}
}
return false;
}
private:
int m_H;
int m_W;
double m_CellSize;
std::vector< std::vector< sPoint> > m_Grid;
};
template <typename PRNG>
sPoint PopRandom(std::vector<sPoint>& Points, PRNG& Generator)
{
const int Idx = Generator.RandomInt(Points.size() - 1);
const sPoint P = Points[Idx];
Points.erase(Points.begin() + Idx);
return P;
}
template <typename PRNG>
sPoint GenerateRandomPointAround(const sPoint& P, double MinDist, PRNG& Generator)
{
// Start with non-uniform distribution
double R1 = Generator.RandomDouble();
double R2 = Generator.RandomDouble();
// radius should be between MinDist and 2 * MinDist
double Radius = MinDist * (R1 + 1.0f);
//random angle
double Angle = 2 * 3.141592653589f * R2;
// the new point is generated around the point (x, y)
double X = P.x + Radius * cos(Angle);
double Y = P.y + Radius * sin(Angle);
return sPoint(X, Y);
}
// Return a vector of generated points
// NewPointsCount - refer to bridson-siggraph07-poissondisk.pdf
// for details (the value 'k')
// Circle - 'true' to fill a circle, 'false' to fill a rectangle
// MinDist - minimal distance estimator, use negative value for default
template <typename PRNG = DefaultPRNG>
std::vector<sPoint> GeneratePoissonPoints(rsize_t NumPoints, PRNG& Generator, int NewPointsCount = 30,
bool Circle = true, double MinDist = -1.0f)
{
if (MinDist < 0.0f)
{
MinDist = sqrt(double(NumPoints)) / double(NumPoints);
}
std::vector <sPoint> SamplePoints;
std::vector <sPoint> ProcessList;
// create the grid
double CellSize = MinDist / sqrt(2.0f);
int GridW = (int)(ceil)(1.0f / CellSize);
int GridH = (int)(ceil)(1.0f / CellSize);
sGrid Grid(GridW, GridH, CellSize);
sPoint FirstPoint;
do
{
FirstPoint = sPoint(Generator.RandomDouble(), Generator.RandomDouble());
} while (!(Circle ? FirstPoint.IsInCircle() : FirstPoint.IsInRectangle()));
//Update containers
ProcessList.push_back(FirstPoint);
SamplePoints.push_back(FirstPoint);
Grid.Insert(FirstPoint);
// generate new points for each point in the queue
while (!ProcessList.empty() && SamplePoints.size() < NumPoints)
{
#if POISSON_PROGRESS_INDICATOR
// a progress indicator, kind of
if (SamplePoints.size() % 100 == 0) std::cout << ".";
#endif // POISSON_PROGRESS_INDICATOR
sPoint Point = PopRandom<PRNG>(ProcessList, Generator);
for (int i = 0; i < NewPointsCount; i++)
{
sPoint NewPoint = GenerateRandomPointAround(Point, MinDist, Generator);
bool Fits = Circle ? NewPoint.IsInCircle() : NewPoint.IsInRectangle();
if (Fits && !Grid.IsInNeighbourhood(NewPoint, MinDist, CellSize))
{
ProcessList.push_back(NewPoint);
SamplePoints.push_back(NewPoint);
Grid.Insert(NewPoint);
continue;
}
}
}
#if POISSON_PROGRESS_INDICATOR
std::cout << std::endl << std::endl;
#endif // POISSON_PROGRESS_INDICATOR
return SamplePoints;
}
}
and the main program is:
poisson.cpp
#include "stdafx.h"
#include <vector>
#include <iostream>
#include <fstream>
#include <memory.h>
#define POISSON_PROGRESS_INDICATOR 1
#include "PoissonGenerator.h"
const int NumPoints = 20000; // minimal number of points to generate
int main()
{
PoissonGenerator::DefaultPRNG PRNG;
const auto Points =
PoissonGenerator::GeneratePoissonPoints(NumPoints,PRNG);
std::ofstream File("Poisson.txt", std::ios::out);
File << "NumPoints = " << Points.size() << std::endl;
for (const auto& p : Points)
{
File << " " << p.x << " " << p.y << std::endl;
}
system("PAUSE");
return 0;
}
Suppose you have a point in the space [0,1] x [0,1], in the form of a std::pair<double, double>, but desire points in the space [x,y] x [w,z].
The function object
struct ProjectTo {
double x, y, w, z;
std::pair<double, double> operator(std::pair<double, double> in)
{
return std::make_pair(in.first * (y - x) + x, in.second * (z - w) + w);
}
};
will transform such an input point into the desired output point.
Suppose further you have a std::vector<std::pair<double, double>> points, all drawn from the input distribution.
std::copy(points.begin(), points.end(), points.begin(), ProjectTo{ x, y, w, z });
Now you have a vector of points in the output space.

Adding unused formal parameters to C++ method results in different behavior

When I add some extra formal parameters double tmin=0.0, double tmax=0.0 to the constructor of the Ray in the code below, I always obtain a wrong image with a white top border. These formal parameters currently contribute in no way (i.e. are unused) to the code. So how is it possible to obtain a different image?
System specifications:
OS: Windows 8.1
Compiler: MSVC 2015
Code:
#include "stdafx.h"
#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#include <random>
std::default_random_engine generator(606418532);
std::uniform_real_distribution<double> distribution = std::uniform_real_distribution<double>(0.0, 1.0);
double erand48(unsigned short *x) {
return distribution(generator);
}
#define M_PI 3.14159265358979323846
struct Vector3 {
double x, y, z;
Vector3(double x_ = 0, double y_ = 0, double z_ = 0) { x = x_; y = y_; z = z_; }
Vector3 operator+(const Vector3 &b) const { return Vector3(x + b.x, y + b.y, z + b.z); }
Vector3 operator-(const Vector3 &b) const { return Vector3(x - b.x, y - b.y, z - b.z); }
Vector3 operator*(double b) const { return Vector3(x*b, y*b, z*b); }
Vector3 mult(const Vector3 &b) const { return Vector3(x*b.x, y*b.y, z*b.z); }
Vector3& norm() { return *this = *this * (1 / sqrt(x*x + y*y + z*z)); }
double Dot(const Vector3 &b) const { return x*b.x + y*b.y + z*b.z; } // cross:
Vector3 operator%(Vector3&b) { return Vector3(y*b.z - z*b.y, z*b.x - x*b.z, x*b.y - y*b.x); }
};
//struct Ray { Vector3 o, d; Ray(const Vector3 &o_, const Vector3 &d_, double tmin=0.0, double tmax=0.0) : o(o_), d(d_) {} };
struct Ray { Vector3 o, d; Ray(const Vector3 &o_, const Vector3 &d_) : o(o_), d(d_) {} };
enum Reflection_t { DIFFUSE, SPECULAR, REFRACTIVE };
struct Sphere {
double rad; // radius
Vector3 p, e, f; // position, emission, color
Reflection_t reflection_t; // reflection type (DIFFuse, SPECular, REFRactive)
Sphere(double rad_, Vector3 p_, Vector3 e_, Vector3 f_, Reflection_t reflection_t) :
rad(rad_), p(p_), e(e_), f(f_), reflection_t(reflection_t) {}
double intersect(const Ray &r) const {
Vector3 op = p - r.o;
double t, eps = 1e-4, b = op.Dot(r.d), det = b*b - op.Dot(op) + rad*rad;
if (det<0) return 0; else det = sqrt(det);
return (t = b - det)>eps ? t : ((t = b + det)>eps ? t : 0);
}
};
Sphere spheres[] = {
Sphere(1e5, Vector3(1e5 + 1,40.8,81.6), Vector3(),Vector3(.75,.25,.25),DIFFUSE),//Left
Sphere(1e5, Vector3(-1e5 + 99,40.8,81.6),Vector3(),Vector3(.25,.25,.75),DIFFUSE),//Rght
Sphere(1e5, Vector3(50,40.8, 1e5), Vector3(),Vector3(.75,.75,.75),DIFFUSE),//Back
Sphere(1e5, Vector3(50,40.8,-1e5 + 170), Vector3(),Vector3(), DIFFUSE),//Frnt
Sphere(1e5, Vector3(50, 1e5, 81.6), Vector3(),Vector3(.75,.75,.75),DIFFUSE),//Botm
Sphere(1e5, Vector3(50,-1e5 + 81.6,81.6),Vector3(),Vector3(.75,.75,.75),DIFFUSE),//Top
Sphere(16.5,Vector3(27,16.5,47), Vector3(),Vector3(1,1,1)*.999, SPECULAR),//Mirr
Sphere(16.5,Vector3(73,16.5,78), Vector3(),Vector3(1,1,1)*.999, REFRACTIVE),//Glas
Sphere(600, Vector3(50,681.6 - .27,81.6),Vector3(12,12,12), Vector3(), DIFFUSE) //Lite
};
inline double clamp(double x) { return x<0 ? 0 : x>1 ? 1 : x; }
inline int toInt(double x) { return int(pow(clamp(x), 1 / 2.2) * 255 + .5); }
inline bool intersect(const Ray &r, double &t, int &id) {
double n = sizeof(spheres) / sizeof(Sphere), d, inf = t = 1e20;
for (int i = int(n); i--;) if ((d = spheres[i].intersect(r)) && d<t) { t = d; id = i; }
return t<inf;
}
Vector3 radiance(const Ray &r_, int depth_, unsigned short *Xi) {
double t; // distance to intersection
int id = 0; // id of intersected object
Ray r = r_;
int depth = depth_;
Vector3 cl(0, 0, 0); // accumulated color
Vector3 cf(1, 1, 1); // accumulated reflectance
while (1) {
if (!intersect(r, t, id)) return cl; // if miss, return black
const Sphere &obj = spheres[id]; // the hit object
Vector3 x = r.o + r.d*t, n = (x - obj.p).norm(), nl = n.Dot(r.d)<0 ? n : n*-1, f = obj.f;
double p = f.x>f.y && f.x>f.z ? f.x : f.y>f.z ? f.y : f.z; // max refl
cl = cl + cf.mult(obj.e);
if (++depth>5) if (erand48(Xi)<p) f = f*(1 / p); else return cl; //R.R.
cf = cf.mult(f);
if (obj.reflection_t == DIFFUSE) { // Ideal DIFFUSE reflection
double r1 = 2 * M_PI*erand48(Xi), r2 = erand48(Xi), r2s = sqrt(r2);
Vector3 w = nl, u = ((fabs(w.x)>.1 ? Vector3(0, 1) : Vector3(1)) % w).norm(), v = w%u;
Vector3 d = (u*cos(r1)*r2s + v*sin(r1)*r2s + w*sqrt(1 - r2)).norm();
r = Ray(x, d);
continue;
}
else if (obj.reflection_t == SPECULAR) {
r = Ray(x, r.d - n * 2 * n.Dot(r.d));
continue;
}
Ray reflRay(x, r.d - n * 2 * n.Dot(r.d));
bool into = n.Dot(nl)>0;
double nc = 1, nt = 1.5, nnt = into ? nc / nt : nt / nc, ddn = r.d.Dot(nl), cos2t;
if ((cos2t = 1 - nnt*nnt*(1 - ddn*ddn))<0) {
r = reflRay;
continue;
}
Vector3 tdir = (r.d*nnt - n*((into ? 1 : -1)*(ddn*nnt + sqrt(cos2t)))).norm();
double a = nt - nc, b = nt + nc, R0 = a*a / (b*b), c = 1 - (into ? -ddn : tdir.Dot(n));
double Re = R0 + (1 - R0)*c*c*c*c*c, Tr = 1 - Re, P = .25 + .5*Re, RP = Re / P, TP = Tr / (1 - P);
if (erand48(Xi)<P) {
cf = cf*RP;
r = reflRay;
}
else {
cf = cf*TP;
r = Ray(x, tdir);
}
continue;
}
}
int main(int argc, char *argv[]) {
int w = 512, h = 384, samps = argc == 2 ? atoi(argv[1]) / 4 : 1; // # samples
Ray cam(Vector3(50, 52, 295.6), Vector3(0, -0.042612, -1).norm()); // cam pos, dir
Vector3 cx = Vector3(w*.5135 / h), cy = (cx%cam.d).norm()*.5135, r, *c = new Vector3[w*h];
#pragma omp parallel for schedule(dynamic, 1) private(r) // OpenMP
for (int y = 0; y<h; y++) { // Loop over image rows
fprintf(stderr, "\rRendering (%d spp) %5.2f%%", samps * 4, 100.*y / (h - 1));
for (unsigned short x = 0, Xi[3] = { 0,0,y*y*y }; x<w; x++) // Loop cols
for (int sy = 0, i = (h - y - 1)*w + x; sy<2; sy++) // 2x2 subpixel rows
for (int sx = 0; sx<2; sx++, r = Vector3()) { // 2x2 subpixel cols
for (int s = 0; s<samps; s++) {
double r1 = 2 * erand48(Xi), dx = r1<1 ? sqrt(r1) - 1 : 1 - sqrt(2 - r1);
double r2 = 2 * erand48(Xi), dy = r2<1 ? sqrt(r2) - 1 : 1 - sqrt(2 - r2);
Vector3 d = cx*(((sx + .5 + dx) / 2 + x) / w - .5) +
cy*(((sy + .5 + dy) / 2 + y) / h - .5) + cam.d;
r = r + radiance(Ray(cam.o + d * 140, d.norm()), 0, Xi)*(1. / samps);
} // Camera rays are pushed ^^^^^ forward to start in interior
c[i] = c[i] + Vector3(clamp(r.x), clamp(r.y), clamp(r.z))*.25;
}
}
FILE *fp;
fopen_s(&fp, "image.ppm", "w"); // Write image to PPM file.
fprintf(fp, "P3\n%d %d\n%d\n", w, h, 255);
for (int i = 0; i<w*h; i++)
fprintf(fp, "%d %d %d ", toInt(c[i].x), toInt(c[i].y), toInt(c[i].z));
}
First Ray structure:
struct Ray { Vector3 o, d; Ray(const Vector3 &o_, const Vector3 &d_) : o(o_), d(d_) {} };
Results in:
Second Ray structure:
struct Ray { Vector3 o, d; Ray(const Vector3 &o_, const Vector3 &d_, double tmin=0.0, double tmax=0.0) : o(o_), d(d_) {} };
Results in:
The last image has a noticeable white top border which is not present in the first image.
Edit:
I used
size_t n = sizeof(spheres) / sizeof(Sphere);
Now I obtain the same images, but I also checked if the original int(n) could differ from 9 which is never the case.
Ok this is from the Debug build, which is different from the Release build.
Sounds like a memory error, looking quickly at your code I'm sceptical of this line:
for (int i = int(n); i--;) if ((d = spheres[i].intersect(r)) && d<t)
I suspect accessing sphere[i] is out of bounds, perhaps you should try sphere[i-1]. You could also try compiling your code with a compiler that adds extra code for debugging/sanitising/checking memory addresses.

How to speed up my bilinear interpolation?

I have spent countless hours trying to speed up my bilinear interpolation up, with no avail. I even tried a SSE version (a double version and a float version), but that was even slower than this version.
Does anyone have any ideas?
template <typename T>
__forceinline void interp2_mx(const T& x, const T& y,
const T* z,
const int32_t& n,
const int32_t& mm2,
const int32_t& nm2,
T& val,
const T& extrapval = T(0))
{
int64_t xp = (int64_t)(x) - 1; // adjust for MATLAB indexing
int64_t yp = (int64_t)(y) - 1;
if (xp < 0 || xp > nm2 || yp < 0 || yp > mm2)
{
val = extrapval;
}
else
{
const T* line = z + yp * n + xp;
T xf = x - (int64_t)(x);
T yf = y - (int64_t)(y);
T x1mf = (T)1 - xf;
T y1mf = (T)1 - yf;
T v00 = x1mf * y1mf * (*(line));
T v01 = xf * y1mf * (*(line + 1));
T v10 = x1mf * yf * (*(line + n));
T v11 = xf * yf * (*(line + n + 1));
val = v00 + v01 + v10 + v11;
}
}
template <typename T>
void interp2(const T* z,
const int32_t& mz, const int32_t& nz,
const T* xi, const T* yi,
const int32_t& mi, const int32_t& ni,
T* zi,
const T& extrapval = T(0))
{
const int32_t nzm2 = nz - 2;
const int32_t mzm2 = mz - 2;
#pragma omp parallel for
for (int m = 0; m < mi; ++m)
{
T* line_zi = zi + m * ni;
const T* x = xi + m * ni;
const T* y = yi + m * ni;
for (int n = 0; n < ni; ++n, ++x, ++y, ++line_zi)
{
interp2_mx((*x), (*y), z, nz, mzm2, nzm2, (*line_zi));
}
}
}
Your calculation of xf does a float-to-int64_t-to-float conversion. I assume you know the value is in range, otherwise this would be Undefined Behavior (and mathematically pointless). std::modf() may be the better function as it directly calculates the desired value.
I also think that adjacent pixels have rather related xf & x1mf values, yet you recalculate them. I'm not sure about this as your coordinates seem to be stored indirectly ((*x), (*y)). It may very well be more efficient to calculate those on the fly. Since these pointers may alias the output, they can't be prefetched and the reads will be blocking the memory bus.

Newton Fractal generation

I wanted to write my own newton fractal generator.. It's using OpenCL... but that's not the problem.. my problem is that atm. only veerryy few pixels are converging.
So to explain what I've done so far:
I've selected a function I wanted to use: f(z)=z^4-1 (for testing purposes)
I've calculated the roots of this function: 1+0î, -1+0î, 0+1î, 0-1î
I've written a OpenCL Host and Kernel:
the kernel uses a struct with 4 doubles: re (real), im (imaginary), r (as abs), phi (as argument, polar angle or how you call it)
computes from resolution, zoom and global_work_id the "type" of the pixel and the intensity - where type is the root the newton method is converging to / whether it's diverging
Here's what I get rendered:
Here is the whole kernel:
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#define pi 3.14159265359
struct complex {
double im;
double re;
double r;
double phi;
};
struct complex createComplexFromPolar(double _r, double _phi){
struct complex t;
t.r = _r;
t.phi = _phi;
t.re = cos(t.phi)*t.r;
t.im = sin(t.phi)*t.r;
return t;
}
struct complex createComplexFromKarthes(double real, double imag){
struct complex t;
t.re = real;
t.im = imag;
t.phi = atan(imag / real);
t.r = sqrt(t.re*t.re + t.im*t.im);
return t;
}
struct complex recreateComplexFromKarthes(struct complex t){
return t = createComplexFromKarthes(t.re, t.im);
}
struct complex recreateComplexFromPolar(struct complex t){
return t = createComplexFromPolar(t.r, t.phi);
}
struct complex addComplex(const struct complex z, const struct complex c){
struct complex t;
t.re = c.re + z.re;
t.im = c.im + z.im;
return recreateComplexFromKarthes(t);
}
struct complex subComplex(const struct complex z, const struct complex c){
struct complex t;
t.re = z.re - c.re;
t.im = z.im - c.im;
return recreateComplexFromKarthes(t);
}
struct complex addComplexScalar(const struct complex z, const double n){
struct complex t;
t.re = z.re + n;
return recreateComplexFromKarthes(t);
}
struct complex subComplexScalar(const struct complex z, const double n){
struct complex t;
t.re = z.re - n;
return recreateComplexFromKarthes(t);
}
struct complex multComplexScalar(const struct complex z, const double n) {
struct complex t;
t.re = z.re * n;
t.im = z.im * n;
return recreateComplexFromKarthes(t);
}
struct complex multComplex(const struct complex z, const struct complex c) {
return createComplexFromPolar(z.r*c.r, z.phi + c.phi);
}
struct complex powComplex(const struct complex z, int i){
struct complex t = z;
for (int j = 0; j < i; j++){
t = multComplex(t, z);
}
return t;
}
struct complex divComplex(const struct complex z, const struct complex c) {
return createComplexFromPolar(z.r / c.r, z.phi - c.phi);
}
bool compComplex(const struct complex z, const struct complex c, float comp){
struct complex t = subComplex(z, c);
if (fabs(t.re) <= comp && fabs(t.im) <= comp)
return true;
return false;
}
__kernel void newtonFraktal(__global const int* res, __global const int* zoom, __global int* offset, __global const double* param, __global int* result, __global int* resType){
const int x = get_global_id(0) + offset[0];
const int y = get_global_id(1) + offset[1];
const int xRes = res[0];
const int yRes = res[1];
const double a = (x - (xRes / 2)) == 0 ? 0 : (double)(zoom[0] / (x - (double)(xRes / 2)));
const double b = (y - (yRes / 2)) == 0 ? 0 : (double)(zoom[1] / (y - (double)(yRes / 2)));
struct complex z = createComplexFromKarthes(a, b);
struct complex zo = z;
struct complex c = createComplexFromKarthes(param[0], param[1]);
struct complex x1 = createComplexFromKarthes(1,0);
struct complex x2 = createComplexFromKarthes(-1, 0);
struct complex x3 = createComplexFromKarthes(0, 1);
struct complex x4 = createComplexFromKarthes(0, -1);
resType[x + xRes * y] = 3;
int i = 0;
while (i < 30000 && fabs(z.r) < 10000){
z = subComplex(z, divComplex(subComplexScalar(powComplex(z, 4), 1), multComplexScalar(powComplex(z, 3), 4)));
i++;
if (compComplex(z, x1, 0.05)){
resType[x + xRes * y] = 0;
break;
} else if (compComplex(z, x2, 0.05)){
resType[x + xRes * y] = 1;
break;
} else if (compComplex(z, x3, 0.05)){
resType[x + xRes * y] = 2;
break;
}
}
if (fabs(z.r) >= 10000){
resType[x + xRes * y] = 4;
}
result[x + xRes * y] = i;
}
And here is the coloration of the image:
const int xRes = core->getXRes();
const int yRes = core->getYRes();
for (int y = 0; y < fraktal->getHeight(); y++){
for (int x = 0; x < fraktal->getWidth(); x++){
int conDiv = genCL->result[x + y * xRes];
int type = genCL->typeRes[x + y * xRes];
if (type == 0){
//converging to x1
fraktal->setPixel(x, y, 1*conDiv, 1*conDiv, 0, 1);
} else if (type == 1){
//converging to x2
fraktal->setPixel(x, y, 0, 0, 1*conDiv, 1);
} else if (type == 2){
//converging to x3
fraktal->setPixel(x, y, 0, 1*conDiv, 0, 1);
} else if (type == 3){
//diverging and interrupted by loop end
fraktal->setPixel(x, y, 1*conDiv, 0, 0, 1);
} else {
//diverging and interrupted by z.r > 10000
fraktal->setPixel(x, y, 1, 1, 1, 0.1*conDiv);
}
}
}
I had some mistakes in the complex number computations but I check everything today again and again.. I think they should be okay now.. but what else could be the reason that there are just this few start values converging? Did I do something wrong with newton's method?
Thanks for all your help!!
Well somewhat it really helped to run the code as normal C code.. as this makes Debugging easier: so the issue were some coding issues which I have been able to solve now.. for example my pow function was corrupted and when I added or subtracted I forgot to set the imaginary part to the temp complex number .. so here's my final OpenCL kernel:
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#define pi 3.14159265359
struct complex {
double im;
double re;
double r;
double phi;
};
struct complex createComplexFromPolar(double _r, double _phi){
struct complex t;
t.r = _r;
t.phi = _phi;
t.re = _r*cos(_phi);
t.im = _r*sin(_phi);
return t;
}
struct complex createComplexFromKarthes(double real, double imag){
struct complex t;
t.re = real;
t.im = imag;
t.phi = atan2(imag, real);
t.r = sqrt(t.re*t.re + t.im*t.im);
return t;
}
struct complex recreateComplexFromKarthes(struct complex t){
return createComplexFromKarthes(t.re, t.im);
}
struct complex recreateComplexFromPolar(struct complex t){
return createComplexFromPolar(t.r, t.phi);
}
struct complex addComplex(const struct complex z, const struct complex c){
return createComplexFromKarthes(c.re + z.re, c.im + z.im);
}
struct complex subComplex(const struct complex z, const struct complex c){
return createComplexFromKarthes(z.re - c.re, z.im - c.im);
}
struct complex addComplexScalar(const struct complex z, const double n){
return createComplexFromKarthes(z.re + n,z.im);
}
struct complex subComplexScalar(const struct complex z, const double n){
return createComplexFromKarthes(z.re - n, z.im);
}
struct complex multComplexScalar(const struct complex z, const double n){
return createComplexFromKarthes(z.re * n,z.im * n);
}
struct complex multComplex(const struct complex z, const struct complex c) {
return createComplexFromKarthes(z.re*c.re-z.im*c.im, z.re*c.im+z.im*c.re);
//return createComplexFromPolar(z.r*c.r, z.phi + c.phi);
}
struct complex powComplex(const struct complex z, int i){
struct complex t = z;
for (int j = 0; j < i-1; j++){
t = multComplex(t, z);
}
return t;
}
struct complex divComplex(const struct complex z, const struct complex c) {
return createComplexFromPolar(z.r / c.r, z.phi-c.phi);
}
bool compComplex(const struct complex z, const struct complex c, float comp){
if (fabs(z.re - c.re) <= comp && fabs(z.im - c.im) <= comp)
return true;
return false;
}
__kernel void newtonFraktal(__global const int* res, __global const int* zoom, __global int* offset, __global const double* param, __global int* result, __global int* resType){
const int x = get_global_id(0) + offset[0];
const int y = get_global_id(1) + offset[1];
const int xRes = res[0];
const int yRes = res[1];
const double a = (x - (xRes / 2)) == 0 ? 0 : (double)((x - (double)(xRes / 2)) / zoom[0]);
const double b = (y - (yRes / 2)) == 0 ? 0 : (double)((y - (double)(yRes / 2)) / zoom[1]);
struct complex z = createComplexFromKarthes(a, b);
//struct complex c = createComplexFromKarthes(param[0], param[1]);
struct complex x1 = createComplexFromKarthes(0.7071068, 0.7071068);
struct complex x2 = createComplexFromKarthes(0.7071068, -0.7071068);
struct complex x3 = createComplexFromKarthes(-0.7071068, 0.7071068);
struct complex x4 = createComplexFromKarthes(-0.7071068, -0.7071068);
struct complex f, d;
resType[x + xRes * y] = 11;
int i = 0;
while (i < 6000 && fabs(z.r) < 10000){
f = addComplexScalar(powComplex(z, 4), 1);
d = multComplexScalar(powComplex(z, 3), 3);
z = subComplex(z, divComplex(f, d));
i++;
if (compComplex(z, x1, 0.0000001)){
resType[x + xRes * y] = 0;
break;
} else if (compComplex(z, x2, 0.0000001)){
resType[x + xRes * y] = 1;
break;
} else if (compComplex(z, x3, 0.0000001)){
resType[x + xRes * y] = 2;
break;
} else if (compComplex(z, x4, 0.0000001)){
resType[x + xRes * y] = 3;
break;
}
}
if (fabs(z.r) >= 1000){
resType[x + xRes * y] = 10;
}
result[x + xRes * y] = i;
}
hope it might help someone someday.. :)

Binary * Operator Not Found

I have a vector class with a properly overloaded Vect*float operator and am trying to create the global/non-member float*Vect operator as follows: (Note this is a heavily edited sample)
class Vect
{
public:
Vect::Vect(const float p_x, const float p_y, const float p_z, const float p_w);
Vect operator*(const float p_sclr) const;
private:
float x;
float y;
float z;
float w;
};
Vect::Vect(const float p_x, const float p_y, const float p_z, const float p_w) {
x = p_x;
y = p_y;
z = p_z;
w = p_w;
}
Vect Vect::operator*(const float p_sclr) const {
return Vect( (x * p_sclr), (y * p_sclr), (z * p_sclr), 1); // reset w to 1
}
//Problem Non-MemberOperator
Vect operator*(const float p_sclr, const Vect& p_vect);
Vect operator*(const float p_sclr, const Vect& p_vect) {
return p_vect * p_sclr;
}
But when I go to test the operator with the call:
Vect A(2.0f, 3.0f, 4.0f, 5.0f);
float s = 5.0f;
Vect C, D;
C = A * s; // Fine
D = s * A; // Error as below
I receive the following compile error:
error C2678: binary '*' : no operator found which takes a left-hand operand of type 'float' (or there is no acceptable conversion)
Can anyone provide insight to why this happens? The MS documentation is available at http://msdn.microsoft.com/en-us/library/ys0bw32s(v=VS.90).aspx and isn't very helpful Visual Studio 2008. This is the only compile error or warning I receive.
You still havn't posted a complete example. I can compile the following code without any problems:
class vect
{
float coeffs[4];
public:
vect()
{
for (int k=0; k<4; ++k)
coeffs[k] = 0;
}
vect(float x, float y, float z, float w)
{
coeffs[0] = x;
coeffs[1] = y;
coeffs[2] = z;
coeffs[3] = w;
}
vect operator*(float scalar) const
{
return vect(
scalar*coeffs[0],
scalar*coeffs[1],
scalar*coeffs[2],
scalar*coeffs[3] );
}
};
vect operator*(float scalar, vect const& x)
{
return x*scalar;
}
void test()
{
vect a (2,3,4,5);
float s = 5;
vect c, d;
c = a * s;
d = s * a;
}
So, the problem must lie somewhere else.
I also can compile the code without any problems (like sellibitze, who beat me to it!)
Here's the code I used:
//main.cpp
#include <iostream>
using namespace std;
class Vect
{
public:
float x,y,z,w;
Vect(const float p_x, const float p_y, const float p_z, const float p_w) {
x = p_x;
y = p_y;
z = p_z;
w = p_w;
}
Vect()
{
x=y=z=w=0;
}
Vect operator*(const float p_sclr) const {
return Vect( (x * p_sclr), (y * p_sclr), (z * p_sclr), 1); // reset w to 1
}
};
Vect operator*(const float p_sclr, const Vect& p_vect) {
return p_vect * p_sclr;
}
int main()
{
Vect A(2.0f, 3.0f, 4.0f, 5.0f);
float s = 5.0f;
Vect C, D;
C = A * s; // Fine
D = s * A; // Error as below
cout << D.x << endl;
return 0;
}
Edit: Like sellibitze suggests, the problem may lie elsewhere. Is the error you're listing the ONLY error your compiler is giving you? Also, what version of Visual Studio are you running?
Looks like 2 beat me to it, but it worked for me too - built and ran fine (VS2010, Win32 console project):
class Vect
{
public:
float x,y,z,w;
Vect::Vect(){}
Vect::Vect(const float p_x, const float p_y, const float p_z, const float p_w)
{
x = p_x;
y = p_y;
z = p_z;
w = p_w;
}
Vect Vect::operator*(const float p_sclr) const
{
return Vect( (x * p_sclr), (y * p_sclr), (z * p_sclr), 1); // reset w to 1
}
};
Vect operator*(const float p_sclr, const Vect& p_vect) { return p_vect * p_sclr;}
int _tmain(int argc, _TCHAR* argv[])
{
Vect a (2,3,4,5);
float s = 5;
Vect c, d;
c = a * s;
d = s * a;
}
The final solution was to go with a friend function because the variables were in p_vect were private:
//class definition
friend Vect operator*(const float scale, const Vect& p_vect);
//function
Vect operator*(const float p_sclr, const Vect& p_vect) {
return Vect( (p_vect.x * p_sclr), (p_vect.y * p_sclr), (p_vect.z * p_sclr), 1.0f);
}