My barycentric triangle rasterizer draws every third pixel - c++

It also draws multiply triangles, they are all displaced and in wrong scale.
I am trying to make my own implementation of triangle rasterizer found at:
I have no idea what is wrong with my code.
class Vertice
float x, y;
Vertice(float x, float y)
this->x = x;
this->y = y;
void fitToImage(int imageWidth, int imageHeight)
x = (x * (imageWidth / 2)) + (imageWidth / 2);
y = (-y * (imageHeight / 2)) + (imageHeight / 2);
class Image
int imageWidth, imageHeight;
unsigned char* pixels;
Image(int imageWidth, int imageHeight)
this->imageWidth = imageWidth;
this->imageHeight = imageHeight;
pixels = new unsigned char[imageWidth * imageHeight * 3];
delete[] pixels;
void setPixel(int x, int y, int red, int green, int blue)
int help_var = ((y * imageHeight) + x) * 3;
pixels[help_var + 0] = (char)red;
pixels[help_var + 1] = (char)green;
pixels[help_var + 2] = (char)blue;
void fillPixels(int red, int green, int blue)
int help_var = imageWidth * imageHeight * 3;
for (int i = 0; i < help_var; i += 3) {
pixels[i + 0] = (char)red;
pixels[i + 1] = (char)green;
pixels[i + 2] = (char)blue;
//-------------------BARYCENTRIC TRIANGLE RASTERISATION------------------------
float edgeFunction(const Vertice& A, const Vertice& B, const Vertice& P)
return ((P.x - A.x)*(B.y - A.y) + (P.y - A.y)*(B.x - A.x));
void fillTriangleBarycentric(const Vertice& v0, const Vertice& v1, const Vertice& v2)
Vertice p(0.0f, 0.0f);
for (int x = 0; x < imageWidth; x++) {
for (int y = 0; y < imageHeight; y++) {
p.x = x + 0.5f; p.y = y + 0.5f;
float w0 = edgeFunction(v1, v2, p);
float w1 = edgeFunction(v2, v0, p);
float w2 = edgeFunction(v0, v1, p);
if (w0 >= 0 && w1 >= 0 && w2 >= 0) {
setPixel(x, y, 0, 0, 255);
int main()
Image image(800, 600);
image.fillPixels(255, 255, 255);
Vertice a(0.2f, 0.5f);
Vertice b(-0.5f, 0.0f);
Vertice c(0.5f, -0.5f);
a.fitToImage(image.imageWidth, image.imageHeight);
b.fitToImage(image.imageWidth, image.imageHeight);
c.fitToImage(image.imageWidth, image.imageHeight);
image.fillTriangleBarycentric(a, b, c);
std::ofstream imageFile;"./drawing_triangle_test_image.ppm");
imageFile << "P6\n" << image.imageWidth << " " << image.imageHeight << "\n255\n";
imageFile.write((char*)image.pixels, image.imageWidth * image.imageHeight * 3);
return 0;
Here is the image I get after running my program.
Thanks for any help!
Here is the better result (where setPixel is using imageWidth instead of imageHeight):

y * imageHeight
Is definitely the type of error your code has (might have multiple instances). You need to multiply the y position by the width. Otherwise, you'll end up interlacing the triangle at random x positions.
The fact that you get four triangles relates to 800/600 simplifying to 4/3. Had you rendered to 797 by 603, you'd probably would have gotten some random mess of horizontal lines.

In addition to #Jeffrey's correction your edge function is also not quite right. It should be
float edgeFunction(const Vertice& A, const Vertice& B, const Vertice& P)
return ((P.x - A.x)*(B.y - A.y) - (P.y - A.y)*(B.x - A.x));
i.e. there should be a negative sign between the two terms (because it is the cross product of the two position vectors AB and AP).


Rotate RGBA image clockwise

I have 1d array (size = 4 * width * height + 1) of pixels of RGBA png image. I want to rotate image by X degrees clockwise. I already know how to do it for 90 degrees, but I guess I have some problem with trigonometry.
Here's the code:
std::pair<int, int> move(int x, int y, double rad) {
return {x * cos(rad) - y * sin(rad), x * cos(rad) + y * sin(rad)};
void turn(int deg) {
if (deg < 0) {
deg = 360 + deg;
double rad = deg * (M_PI / (double)180);
unsigned int oldWidth = width;
width = lround(sqrt(height * height + width * width));
height = lround(sqrt(height * height + oldWidth * oldWidth));
std::vector<unsigned char> output(rawPixels.size());
for (int X = 0; X < width; ++X) {
for (int Y = 0; Y < height; ++Y) {
for (int chan = 0; chan < CHANNELS_COUNT; ++chan) {
std::pair<int, int> xy = move(X, Y, rad);
output[(X * height + Y) * CHANNELS_COUNT + chan] = rawPixels[
((height - 1 - xy.second) * width + xy.first) * CHANNELS_COUNT + chan];
rawPixels = output;
It's ok to use addition arrays, but I don't want to use OpenCV or any other libraries.

How CUDA box filter works?

I have this sample of code that I try to understand it:
__global__ void
d_boxfilter_rgba_x(unsigned int *od, int w, int h, int r)
float scale = 1.0f / (float)((r << 1) + 1);
unsigned int y = blockIdx.x*blockDim.x + threadIdx.x;
if (y < h)
float4 t = make_float4(0.0f);
for (int x = -r; x <= r; x++)
t += tex2D(rgbaTex, x, y);
od[y * w] = rgbaFloatToInt(t * scale);
for (int x = 1; x < w; x++)
t += tex2D(rgbaTex, x + r, y);
t -= tex2D(rgbaTex, x - r - 1, y);
od[y * w + x] = rgbaFloatToInt(t * scale);
__global__ void
d_boxfilter_rgba_y(unsigned int *id, unsigned int *od, int w, int h, int r)
unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;
id = &id[x];
od = &od[x];
float scale = 1.0f / (float)((r << 1) + 1);
float4 t;
// partea din stanga
t = rgbaIntToFloat(id[0]) * r;
for (int y = 0; y < (r + 1); y++)
t += rgbaIntToFloat(id[y*w]);
od[0] = rgbaFloatToInt(t * scale);
for (int y = 1; y < (r + 1); y++)
t += rgbaIntToFloat(id[(y + r) * w]);
t -= rgbaIntToFloat(id[0]);
od[y * w] = rgbaFloatToInt(t * scale);
// main loop
for (int y = (r + 1); y < (h - r); y++)
t += rgbaIntToFloat(id[(y + r) * w]);
t -= rgbaIntToFloat(id[((y - r) * w) - w]);
od[y * w] = rgbaFloatToInt(t * scale);
// right side
for (int y = h - r; y < h; y++)
t += rgbaIntToFloat(id[(h - 1) * w]);
t -= rgbaIntToFloat(id[((y - r) * w) - w]);
od[y * w] = rgbaFloatToInt(t * scale);
This should be a box filter with CUDA.
From what I have read this should make an average with a given radius.
But in d_boxfilter_rgba_y make something like this:
od[0] = rgbaFloatToInt(t * scale);
I don't understand why is used this scale and why are made all that loops when there should be just one. To calculate the value from -r to +r and divide this by a number of pixels.
Can somebody help me?
To calculate the average of a box with radius 1 (3 values), you do:
(box[0] + box[1] + box[2]) / 3 // which is equal to
(box[0] + box[1] + box[2] * 1/3 // which is equal to your scale factor
The calculation of scale is:
1.0f / (float)((r << 1) + 1); // equal to
1 / ((r * 2) + 1) // equal to
1 / (2r + 1) // 2r because you go to the left and right and +1 for the middle
The two for loops are used, because the "sliding window" optimisation is used. First the first box is calculated:
for (int x = -r; x <= r; x++)
t += tex2D(rgbaTex, x, y);
And then for each step to the right, the value right of the box is added and the most left value of the box is removed. That way you can calculate the sum of the box with just 2 operations instead of 2*r + 1 operations.
for (int x = 1; x < w; x++)
t += tex2D(rgbaTex, x + r, y);
t -= tex2D(rgbaTex, x - r - 1, y);
od[y * w + x] = rgbaFloatToInt(t * scale);

Performant Threaded C++ Pixel Rendering: Fastest Way?

My goal is simple: I want to create a rendering system in C++ that can draw thousands of bitmaps on screen. I have been trying to use threads to speed up the process but to no avail. In most cases, I have actually slowed down performance by using multiple threads. I am using this project as an educational exercise by not using hardware acceleration. That said, my question is this:
What is the best way to use several threads to accept a massive list of images to be drawn onto the screen and render them at break-neck speeds? I know that I won’t be able to create a system that can rival hardware accelerated graphics, but I believe that my idea is still feasible because the operation is so simple: copying pixels from one memory location to another.
My renderer design uses three core blitting operations: position, rotation, and scale of a bitmap image. I have it set up to only rotate an image when needed, and only scale an image when needed.
I have gone through several designs for this system. All of them too slow to get the job done (300 64x64 bitmaps at barely 60fps).
Here are the designs I have tried:
Immediately drawing a source bitmap on a destination bitmap for every image on screen (moderate speed).
Creating workers that accept a draw instruction and immediately begin working on it while other workers receive their instructions also (slowest).
Workers that receive packages of several instructions at a time (slower).
Saving all drawing instructions up and then parting them up in one swoop to several workers while other tasks (in theory) are being done (slowest).
Here is the bitmap class I am using to blit bitmaps onto each other:
class Bitmap
Bitmap(int w, int h)
width = w;
height = h;
size = w * h;
pixels = new unsigned int[size];
virtual ~Bitmap()
if (pixels != 0)
delete[] pixels;
pixels = 0;
void blit(Bitmap *bmp, float x, float y, float rot, float sclx,
float scly)
// Position only
if (rot == 0 && sclx == 1 && scly == 1)
blitPos(bmp, x, y);
// Rotate only
else if (rot != 0 && sclx == 1 && scly == 1)
blitRot(bmp, x, y, rot);
// Scale only
else if (rot == 0 && (sclx != 1 || scly != 1))
blitScl(bmp, x, y, sclx, scly);
// If it is not one of those, you have to do all three... :D
// Create a bitmap that is scaled to the new size.
Bitmap tmp((int)(bmp->width * sclx), (int)(bmp->height * scly));
// Find how much each pixel steps:
float step_x = (float)bmp->width / (float)tmp.width;
float step_y = (float)bmp->height / (float)tmp.height;
// Fill the scaled image with pixels!
float inx = 0;
int xOut = 0;
while (xOut < tmp.width)
float iny = 0;
int yOut = 0;
while (yOut < tmp.height)
unsigned int sample = bmp->pixels[
(int)(std::floor(inx) + std::floor(iny) * bmp->width)
tmp.drawPixel(xOut, yOut, sample);
iny += step_y;
inx += step_x;
blitRot(&tmp, x, y, rot);
void drawPixel(int x, int y, unsigned int color)
if (x > width || y > height || x < 0 || y < 0)
if (color == 0x00000000)
int index = x + y * width;
if (index >= 0 && index <= size)
pixels[index] = color;
unsigned int getPixel(int x, int y)
return pixels[x + y * width];
void clear(unsigned int color)
std::fill(&pixels[0], &pixels[size], color);
void blitPos(Bitmap *bmp, float x, float y)
// Don't draw if coordinates are already past edges
if (x > width || y > height || y + bmp->height < 0 || x + bmp->width < 0)
int from;
int to;
int destfrom;
int destto;
for (int i = 0; i < bmp->height; i++)
from = i * bmp->width;
to = from + bmp->width;
//////// Caps
// Bitmap is being drawn past the right edge
if (x + bmp->width > width)
int cap = bmp->width - ((x + bmp->width) - width);
to = from + cap;
// Bitmap is being drawn past the left edge
else if (x + bmp->width < bmp->width)
int cap = bmp->width + x;
from += (bmp->width - cap);
to = from + cap;
//////// Destination Maths
if (x < 0)
destfrom = (y + i) * width;
destto = destfrom + (bmp->width + x);
destfrom = x + (y + i) * width;
destto = destfrom + bmp->width;
// Bitmap is being drawn past either top or bottom edges
if (y + i > height - 1)
if (destfrom > size || destfrom < 0)
memcpy(&pixels[destfrom], &bmp->pixels[from], sizeof(unsigned int) * (to - from));
void blitRot(Bitmap *bmp, float x, float y, float rot)
float sine = std::sin(-rot);
float cosine = std::cos(-rot);
int x1 = (int)(-bmp->height * sine);
int y1 = (int)(bmp->height * cosine);
int x2 = (int)(bmp->width * cosine - bmp->height * sine);
int y2 = (int)(bmp->height * cosine + bmp->width * sine);
int x3 = (int)(bmp->width * cosine);
int y3 = (int)(bmp->width * sine);
int minx = (int)std::min(0, std::min(x1, std::min(x2, x3)));
int miny = (int)std::min(0, std::min(y1, std::min(y2, y3)));
int maxx = (int)std::max(0, std::max(x1, std::max(x2, x3)));
int maxy = (int)std::max(0, std::max(y1, std::max(y2, y3)));
int w = maxx - minx;
int h = maxy - miny;
int srcx;
int srcy;
int dest_x;
int dest_y;
unsigned int color;
for (int sy = miny; sy < maxy; sy++)
for (int sx = minx; sx < maxx; sx++)
srcx = sx * cosine + sy * sine;
srcy = sy * cosine - sx * sine;
dest_x = x + sx;
dest_y = y + sy;
if (dest_x <= width - 1 && dest_y <= height - 1
&& dest_x >= 0 && dest_y >= 0)
color = 0;
// Only grab a pixel if it is inside of the src image
if (srcx < bmp->width && srcy < bmp->height && srcx >= 0 &&
srcy >= 0)
color = bmp->getPixel(srcx, srcy);
// Only this pixel if it is not completely transparent:
if (color & 0xFF000000)
// Only if the pixel is somewhere between 0 and the bmp size
if (0 < srcx < bmp->width && 0 < srcy < bmp->height)
drawPixel(x + sx, y + sy, color);
void blitScl(Bitmap *bmp, float x, float y, float sclx, float scly)
// Create a bitmap that is scaled to the new size.
int finalwidth = (int)(bmp->width * sclx);
int finalheight = (int)(bmp->height * scly);
// Find how much each pixel steps:
float step_x = (float)bmp->width / (float)finalwidth;
float step_y = (float)bmp->height / (float)finalheight;
// Fill the scaled image with pixels!
float inx = 0;
int xOut = 0;
float iny;
int yOut;
while (xOut < finalwidth)
iny = 0;
yOut = 0;
while (yOut < finalheight)
unsigned int sample = bmp->pixels[
(int)(std::floor(inx) + std::floor(iny) * bmp->width)
drawPixel(xOut + x, yOut + y, sample);
iny += step_y;
inx += step_x;
int width;
int height;
int size;
unsigned int *pixels;
Here is some code showing the latest method I have tried: saving up all instructions and then giving them to workers once they have all been received:
class Instruction
Instruction() {}
Instruction(Bitmap* out, Bitmap* in, float x, float y, float rot,
float sclx, float scly)
: outbuffer(out), inbmp(in), x(x), y(y), rot(rot),
sclx(sclx), scly(scly)
{ }
outbuffer = nullptr;
inbmp = nullptr;
Bitmap* outbuffer;
Bitmap* inbmp;
float x, y, rot, sclx, scly;
Layer Class:
class Layer
bool empty()
return instructions.size() > 0;
std::vector<Instruction> instructions;
int pixel_count;
Worker Thread Class:
class Worker
void start()
done = false;
work_thread = std::thread(&Worker::processData, this);
void processData()
while (true)
if (done)
if (!layers.empty())
for (int i = 0; i < layers.size(); i++)
for (int j = 0; j < layers[i].instructions.size(); j++)
Instruction* inst = &layers[i].instructions[j];
inst->outbuffer->blit(inst->inbmp, inst->x, inst->y, inst->rot, inst->sclx, inst->scly);
void finish()
done = true;
bool done;
std::thread work_thread;
std::mutex controller;
std::vector<Layer> layers;
Finally, the Render Manager Class:
class RenderManager
for (int i = 0; i < 1; i++)
void layer()
current_layer = Layer();
void blit(Bitmap* out, Bitmap* in, float x, float y, float rot, float sclx, float scly)
current_layer.instructions.emplace_back(out, in, x, y, rot, sclx, scly);
void processInstructions()
if (layers.empty())
int index = 0;
for (int i = 0; i < layers.size(); i++)
// Evenly distribute the layers in a round-robin fashion
Layer l = layers[i];
if (index >= workers.size()) index = 0;
void lockall()
for (int i = 0; i < workers.size(); i++)
void unlockall()
for (int i = 0; i < workers.size(); i++)
void finish()
// Wait until every worker is done rendering
// At this point, we know they have nothing more to draw
void endRendering()
for (int i = 0; i < workers.size(); i++)
// Send each one an exit code
// Let the workers finish and then return
for (int i = 0; i < workers.size(); i++)
std::vector<Worker> workers;
std::vector<Layer> layers;
Layer current_layer;
Here is a screenshot of what the 3rd method I tried, and it's results:
Sending packages of draw instructions
What would really be helpful is that if someone could simply point me in the right direction in regards to what method I should try. I have tried these four methods and have failed, so I stand before those who have done greater things than I for help. The least intelligent person in the room is the one that does not ask questions because his pride does not permit it. Please keep in mind though, this is my first question ever on Stack Overflow.

Adding unused formal parameters to C++ method results in different behavior

When I add some extra formal parameters double tmin=0.0, double tmax=0.0 to the constructor of the Ray in the code below, I always obtain a wrong image with a white top border. These formal parameters currently contribute in no way (i.e. are unused) to the code. So how is it possible to obtain a different image?
System specifications:
OS: Windows 8.1
Compiler: MSVC 2015
#include "stdafx.h"
#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#include <random>
std::default_random_engine generator(606418532);
std::uniform_real_distribution<double> distribution = std::uniform_real_distribution<double>(0.0, 1.0);
double erand48(unsigned short *x) {
return distribution(generator);
#define M_PI 3.14159265358979323846
struct Vector3 {
double x, y, z;
Vector3(double x_ = 0, double y_ = 0, double z_ = 0) { x = x_; y = y_; z = z_; }
Vector3 operator+(const Vector3 &b) const { return Vector3(x + b.x, y + b.y, z + b.z); }
Vector3 operator-(const Vector3 &b) const { return Vector3(x - b.x, y - b.y, z - b.z); }
Vector3 operator*(double b) const { return Vector3(x*b, y*b, z*b); }
Vector3 mult(const Vector3 &b) const { return Vector3(x*b.x, y*b.y, z*b.z); }
Vector3& norm() { return *this = *this * (1 / sqrt(x*x + y*y + z*z)); }
double Dot(const Vector3 &b) const { return x*b.x + y*b.y + z*b.z; } // cross:
Vector3 operator%(Vector3&b) { return Vector3(y*b.z - z*b.y, z*b.x - x*b.z, x*b.y - y*b.x); }
//struct Ray { Vector3 o, d; Ray(const Vector3 &o_, const Vector3 &d_, double tmin=0.0, double tmax=0.0) : o(o_), d(d_) {} };
struct Ray { Vector3 o, d; Ray(const Vector3 &o_, const Vector3 &d_) : o(o_), d(d_) {} };
enum Reflection_t { DIFFUSE, SPECULAR, REFRACTIVE };
struct Sphere {
double rad; // radius
Vector3 p, e, f; // position, emission, color
Reflection_t reflection_t; // reflection type (DIFFuse, SPECular, REFRactive)
Sphere(double rad_, Vector3 p_, Vector3 e_, Vector3 f_, Reflection_t reflection_t) :
rad(rad_), p(p_), e(e_), f(f_), reflection_t(reflection_t) {}
double intersect(const Ray &r) const {
Vector3 op = p - r.o;
double t, eps = 1e-4, b = op.Dot(r.d), det = b*b - op.Dot(op) + rad*rad;
if (det<0) return 0; else det = sqrt(det);
return (t = b - det)>eps ? t : ((t = b + det)>eps ? t : 0);
Sphere spheres[] = {
Sphere(1e5, Vector3(1e5 + 1,40.8,81.6), Vector3(),Vector3(.75,.25,.25),DIFFUSE),//Left
Sphere(1e5, Vector3(-1e5 + 99,40.8,81.6),Vector3(),Vector3(.25,.25,.75),DIFFUSE),//Rght
Sphere(1e5, Vector3(50,40.8, 1e5), Vector3(),Vector3(.75,.75,.75),DIFFUSE),//Back
Sphere(1e5, Vector3(50,40.8,-1e5 + 170), Vector3(),Vector3(), DIFFUSE),//Frnt
Sphere(1e5, Vector3(50, 1e5, 81.6), Vector3(),Vector3(.75,.75,.75),DIFFUSE),//Botm
Sphere(1e5, Vector3(50,-1e5 + 81.6,81.6),Vector3(),Vector3(.75,.75,.75),DIFFUSE),//Top
Sphere(16.5,Vector3(27,16.5,47), Vector3(),Vector3(1,1,1)*.999, SPECULAR),//Mirr
Sphere(16.5,Vector3(73,16.5,78), Vector3(),Vector3(1,1,1)*.999, REFRACTIVE),//Glas
Sphere(600, Vector3(50,681.6 - .27,81.6),Vector3(12,12,12), Vector3(), DIFFUSE) //Lite
inline double clamp(double x) { return x<0 ? 0 : x>1 ? 1 : x; }
inline int toInt(double x) { return int(pow(clamp(x), 1 / 2.2) * 255 + .5); }
inline bool intersect(const Ray &r, double &t, int &id) {
double n = sizeof(spheres) / sizeof(Sphere), d, inf = t = 1e20;
for (int i = int(n); i--;) if ((d = spheres[i].intersect(r)) && d<t) { t = d; id = i; }
return t<inf;
Vector3 radiance(const Ray &r_, int depth_, unsigned short *Xi) {
double t; // distance to intersection
int id = 0; // id of intersected object
Ray r = r_;
int depth = depth_;
Vector3 cl(0, 0, 0); // accumulated color
Vector3 cf(1, 1, 1); // accumulated reflectance
while (1) {
if (!intersect(r, t, id)) return cl; // if miss, return black
const Sphere &obj = spheres[id]; // the hit object
Vector3 x = r.o + r.d*t, n = (x - obj.p).norm(), nl = n.Dot(r.d)<0 ? n : n*-1, f = obj.f;
double p = f.x>f.y && f.x>f.z ? f.x : f.y>f.z ? f.y : f.z; // max refl
cl = cl + cf.mult(obj.e);
if (++depth>5) if (erand48(Xi)<p) f = f*(1 / p); else return cl; //R.R.
cf = cf.mult(f);
if (obj.reflection_t == DIFFUSE) { // Ideal DIFFUSE reflection
double r1 = 2 * M_PI*erand48(Xi), r2 = erand48(Xi), r2s = sqrt(r2);
Vector3 w = nl, u = ((fabs(w.x)>.1 ? Vector3(0, 1) : Vector3(1)) % w).norm(), v = w%u;
Vector3 d = (u*cos(r1)*r2s + v*sin(r1)*r2s + w*sqrt(1 - r2)).norm();
r = Ray(x, d);
else if (obj.reflection_t == SPECULAR) {
r = Ray(x, r.d - n * 2 * n.Dot(r.d));
Ray reflRay(x, r.d - n * 2 * n.Dot(r.d));
bool into = n.Dot(nl)>0;
double nc = 1, nt = 1.5, nnt = into ? nc / nt : nt / nc, ddn = r.d.Dot(nl), cos2t;
if ((cos2t = 1 - nnt*nnt*(1 - ddn*ddn))<0) {
r = reflRay;
Vector3 tdir = (r.d*nnt - n*((into ? 1 : -1)*(ddn*nnt + sqrt(cos2t)))).norm();
double a = nt - nc, b = nt + nc, R0 = a*a / (b*b), c = 1 - (into ? -ddn : tdir.Dot(n));
double Re = R0 + (1 - R0)*c*c*c*c*c, Tr = 1 - Re, P = .25 + .5*Re, RP = Re / P, TP = Tr / (1 - P);
if (erand48(Xi)<P) {
cf = cf*RP;
r = reflRay;
else {
cf = cf*TP;
r = Ray(x, tdir);
int main(int argc, char *argv[]) {
int w = 512, h = 384, samps = argc == 2 ? atoi(argv[1]) / 4 : 1; // # samples
Ray cam(Vector3(50, 52, 295.6), Vector3(0, -0.042612, -1).norm()); // cam pos, dir
Vector3 cx = Vector3(w*.5135 / h), cy = (cx%cam.d).norm()*.5135, r, *c = new Vector3[w*h];
#pragma omp parallel for schedule(dynamic, 1) private(r) // OpenMP
for (int y = 0; y<h; y++) { // Loop over image rows
fprintf(stderr, "\rRendering (%d spp) %5.2f%%", samps * 4, 100.*y / (h - 1));
for (unsigned short x = 0, Xi[3] = { 0,0,y*y*y }; x<w; x++) // Loop cols
for (int sy = 0, i = (h - y - 1)*w + x; sy<2; sy++) // 2x2 subpixel rows
for (int sx = 0; sx<2; sx++, r = Vector3()) { // 2x2 subpixel cols
for (int s = 0; s<samps; s++) {
double r1 = 2 * erand48(Xi), dx = r1<1 ? sqrt(r1) - 1 : 1 - sqrt(2 - r1);
double r2 = 2 * erand48(Xi), dy = r2<1 ? sqrt(r2) - 1 : 1 - sqrt(2 - r2);
Vector3 d = cx*(((sx + .5 + dx) / 2 + x) / w - .5) +
cy*(((sy + .5 + dy) / 2 + y) / h - .5) + cam.d;
r = r + radiance(Ray(cam.o + d * 140, d.norm()), 0, Xi)*(1. / samps);
} // Camera rays are pushed ^^^^^ forward to start in interior
c[i] = c[i] + Vector3(clamp(r.x), clamp(r.y), clamp(r.z))*.25;
FILE *fp;
fopen_s(&fp, "image.ppm", "w"); // Write image to PPM file.
fprintf(fp, "P3\n%d %d\n%d\n", w, h, 255);
for (int i = 0; i<w*h; i++)
fprintf(fp, "%d %d %d ", toInt(c[i].x), toInt(c[i].y), toInt(c[i].z));
First Ray structure:
struct Ray { Vector3 o, d; Ray(const Vector3 &o_, const Vector3 &d_) : o(o_), d(d_) {} };
Results in:
Second Ray structure:
struct Ray { Vector3 o, d; Ray(const Vector3 &o_, const Vector3 &d_, double tmin=0.0, double tmax=0.0) : o(o_), d(d_) {} };
Results in:
The last image has a noticeable white top border which is not present in the first image.
I used
size_t n = sizeof(spheres) / sizeof(Sphere);
Now I obtain the same images, but I also checked if the original int(n) could differ from 9 which is never the case.
Ok this is from the Debug build, which is different from the Release build.
Sounds like a memory error, looking quickly at your code I'm sceptical of this line:
for (int i = int(n); i--;) if ((d = spheres[i].intersect(r)) && d<t)
I suspect accessing sphere[i] is out of bounds, perhaps you should try sphere[i-1]. You could also try compiling your code with a compiler that adds extra code for debugging/sanitising/checking memory addresses.

Bug in image blitting algorihtm - 'italic' output

I've got a function which blits some region of source image to destination image. And there is a problem: I've got a bug in it but I can't find it. Probably it's very trivial, but I spent many hours on it :). My algorithm 'tilts' objects on image. In debugging i saw that it copies a little more pixels than it should (e.g 36836 instead 36481)
struct Point
unsigned x, y;
Point(unsigned x, unsigned y) : x(x), y(y) { }
Point() : x(0), y(0) { }
struct Rect
Point lt, rd; // left-top and right-down vertexs
struct Img
vector<unsigned char> px; // pixel data in linear form (RBGARBGARBGA...)
unsigned w, h; // width and heigth of image
inline bool isInRect(const Point& p, const Rect& r)
return (p.x >= && p.y >= && p.x <= r.rd.x && p.y <= r.rd.y);
unsigned blit(const Img& src, Img& dest, const Rect& reg) // <--- THIS FUNCTION
dest.w = reg.rd.x -;
dest.h = reg.rd.y -;
unsigned n = 0;
for(int i = ( * src.w + * 4; i < src.px.size(); i += 4)
unsigned y = (i / 4) / src.w;
unsigned x = (i / 4) % src.w;
if(isInRect(Point(x, y), reg))
dest.px.push_back(src.px[i + 1]);
dest.px.push_back(src.px[i + 2]);
dest.px.push_back(src.px[i + 3]);
n += 4;
if(y > reg.rd.y)
return n / 4;
Image fragment to blit:
Algorithm output: