I am trying to render a fractal calculated using MPI. I used the answer to the following question as reference: sending blocks of 2D array in C using MPI
My problem is, that merge of data via MPI_Gatherv calculated by all the processes does not seem to work properly, because my main process always renders a black screen.
I have the following struct defined:
typedef struct Point {
float r,g,b,x,y;
} Point;
In my main I try to create an MPI_Datatype for the struct:
MPI_Datatype struct_type;
MPI_Datatype struct_members[1] = {MPI_FLOAT};
MPI_Aint offsets[1] = {0};
int struct_blengths[1] = {5};
int struct_items = 1;
MPI_Type_create_struct(struct_items, struct_blengths, offsets, struct_members, &struct_type);
I have a global variable for the calculation result:
Point **mandelbrot;
The variable is allocated thusly before each frame is being recalculated:
if (proc_id == root) {
//Just a check if this is the first frame that is being rendered
if (s > 0) {
s = W;
Point *p = (Point *) malloc(W * H * sizeof(Point));
mandelbrot = (Point **) malloc(W*sizeof(Point *));
for (int i = 0; i < W; i++) {
mandelbrot[i] = &(p[i*H]);
Here I try to create an array subtype using the Point struct (following the referenced answer as best I can):
//Width of the fractal to render
W = width;
//Height of the fractal
H = height;
//Chunk of width each process is responsible for [width / number of processes]
int segmentSize = (int) W / ntasks;
MPI_Datatype type, resizedtype;
int sizes[2] = {W,H}; /* size of global array */
int subsizes[2] = {segmentSize, H}; /* size of sub-region */
int starts[2] = {0,0};
MPI_Type_create_subarray(2, sizes, subsizes, starts, MPI_ORDER_C, struct_type, &type);
MPI_Type_create_resized(type, 0, H*sizeof(Point), &resizedtype);
Calculate the displacements and counts of blocks to send and allocate memory for the process' subarray:
int sendcounts[segmentSize*H];
int displs[segmentSize*H];
if (proc_id == root) {
for (int i=0; i<segmentSize*H; i++) sendcounts[i] = 1;
int disp = 0;
for (int i=0; i<segmentSize; i++) {
for (int j=0; j<H; j++) {
displs[i*H+j] = disp;
disp += 1;
disp += ((W/segmentSize)-1)*H;
Point *p = (Point *) malloc(segmentSize * H * sizeof(Point));
Point **segment;
segment = (Point **) malloc(segmentSize * sizeof(Point*));
for (int i = 0; i < segmentSize; i++) {
segment[i] = &(p[i*H]);
Following that I calculate the color of the Mandelbrot set for each point in the chunk:
int i;
float c[3], dX, dY;
for ( x = 0; x < segmentSize; x++) {
for ( y = 0; y < H; y++) {
//Iterate over the point
i = iterateMandelbrot(rM + x * dR, iM - y * dI);
// Get decimal coordinates for rendering <0,1>
dX = (x + segmentSize * proc_id) / W;
dY = y / H;
//Calculate color using Bernoulli Polynomials
makeColor(i, maxIterations, c);
segment[x][y].x = (float) dX;
segment[x][y].y = (float) dY;
segment[x][y].r = (float) c[0];
segment[x][y].g = (float) c[1];
segment[x][y].b = (float) c[2];
Lastly I try to gather the chunks into the mandelbort variable for the root process to render:
int buffsize = (int) segmentSize * H;
MPI_Gatherv(&(segment[0][0]), W*H/(buffsize), struct_type,
&(mandelbrot[0][0]), sendcounts, displs, resizedtype,
Ok so the problem is now that no data seems to be written into the mandelbrot variable as my main process renders a black screen. Without using MPI the code works so the problem lies somewhere in the MPI_Gatherv call or maybe the way I am allocating the arrays. I realize there might be some memory leak associated with the mandelbrot set or the local segment arrays but that is not my main concern at the moment. Can you see what I am doing wrong here? Any help is appreciated!
I wrote a program in c++ to draw the pixel of bmp image into the console using SetPixel windows function, but after loading the pixel array into the array the image got printed on the console with gaps between the pixels. Thanks in advance for your help!
This is the output of the printed image on the console.
This is the original Image I provided to it.
As you can see here the image width also changes after the print on the console.
// bmp bitmap
#include <stdlib.h>
#include <stdio.h>
#include <windows.h>
using namespace std;
#pragma pack(1)
struct BitmapFileHeader {
unsigned short type;
unsigned int size;
unsigned short reserved1;
unsigned short reserved2;
unsigned int offset;
#pragma pack(0)
unsigned char grayScale(unsigned char r, unsigned char g, unsigned char b) {
return ((r + g + b) / 3);
int main() {
char *data;
FILE *filePointer;
int **ImageArray;
BitmapFileHeader *bmp = (struct BitmapFileHeader*)malloc(sizeof(struct BitmapFileHeader));
HWND console = GetConsoleWindow();
HDC context = ::GetDC(console) ;
filePointer = fopen("tom.bmp", "rb");
if(!filePointer) {
fread(reinterpret_cast<BitmapFileHeader*>(bmp), sizeof(BitmapFileHeader), 1, filePointer);
fread(reinterpret_cast<BITMAPINFOHEADER*>(BitmapInfoHeader), sizeof(BITMAPINFOHEADER), 1, filePointer);
if(BitmapInfoHeader->biSize == 40 && BitmapInfoHeader->biCompression == BI_BITFIELDS) {
printf("This types of image uses Extra bit masks\n");
// row pading
int RowSize = ((BitmapInfoHeader->biBitCount * BitmapInfoHeader->biWidth + 31) / 32) * 4;
int PixelArraySize = RowSize * BitmapInfoHeader->biHeight;
int height = BitmapInfoHeader->biHeight * 5;
int width = BitmapInfoHeader->biWidth * 5;
printf("RowSize: %d PixelArraySize: %d\n", RowSize, PixelArraySize);
ImageArray = (int**)malloc(sizeof(int*)*height);
// memory allocation
for(int i = 0; i < height; i++)
ImageArray[i] = (int*)malloc(sizeof(int)*width);
data = (char*)malloc(PixelArraySize);
fseek(filePointer, bmp->offset, SEEK_SET);
// set image into array
for(int ii = 0; ii < height; ii+=3) {
fread(data, RowSize, 3, filePointer);
for(int jj = 0; jj < width; jj+=3) {
ImageArray[ii][jj] = grayScale(data[jj+2], data[jj+1], data[jj]);
SetPixel(context, -jj+1000, -ii+500, RGB(data[jj+2], data[jj+1], data[jj]));
return 0;
here is the code, which I wrote.
A pixel is described by three bytes, one for each RGB channel. You are dealing with two indices here: The index of the pixel in the row data and the position of the pixel in width direction. You place the pixel and access the row data with the same index.
for (int jj = 0; jj < width; jj++) { // jj: position
int kk = 3 * jj; // kk: data index
ImageArray[ii][jj] = grayScale(data[kk + 2], data[kk + 1], data[kk]);
SetPixel(context, -jj + 1000, -ii + 500, RGB(data[kk + 2], data[kk + 1], data[kk]));
The vertical gaps, i.e. the blank lines, come from incrementing by 3, where you should just increment by 1. (You have no "data index" here, because you read your data row-wide for the current row ii.)
If you want to enlarge your image, as the multiplication of width and height by 5 suggests, you must add a third index: You now have two positions, the source and target positions. This will be easier if you separate your loops: Create ImageArray of the source image in a first nested loop, then draw your scaled target image to the console with a loop over the target oordinates:
int scale = 5;
int ww = scale * w;
int hh = scale * h;
// read ImageArray
for (int y = 0; y < h; y++) {
fread(data, RowSize, 3, filePointer);
for (int x = 0; x < w; x++) {
ImageArray[y][x] = ...;
SetPixel(context, -jj+1000, -ii+500, RGB(data[jj+2], data[jj+1], data[jj]));
for (int yy = 0; yy < hh; yy++) {
fread(data, RowSize, 3, filePointer);
for (int xx = 0; xx < ww; xx++) {
int x = xx / scale;
int y = yy / scale;
SetPixel(context, yy, xx, ImageArray[y][x]);
(Here, single letters re source values, double leters are target values.)
I'm starting with my c++ threads and don't understand some basic stuff. That's Mandelbrot example, it generates fractal image.
It's not my code, I just did some changes (here's original: https://rosettacode.org/wiki/Mandelbrot_set#PPM_non_interactive)
I have this function which generates matrix with colors to save to file:
vector<unsigned char *> drawMandelbrot()
/* screen ( integer) coordinate */
int iX, iY;
double Cx, Cy;
const double CxMin = -2.5;
const double CxMax = 1.5;
const double CyMin = -2.0;
const double CyMax = 2.0;
double PixelWidth = (CxMax - CxMin) / iXmax;
double PixelHeight = (CyMax - CyMin) / iYmax;
int Index = 0;
const int IterationMax = 200;
unsigned char color[3];
vector<unsigned char *> rows(MaxIndex);
double Zx, Zy;
double Zx2, Zy2;
int Iteration;
const double EscapeRadius = 2;
double ER2 = EscapeRadius * EscapeRadius;
for (iY = 0; iY < iYmax; iY++)
Cy = CyMin + iY * PixelHeight;
if (fabs(Cy) < PixelHeight / 2)
Cy = 0.0; /* Main antenna */
for (iX = 0; iX < iXmax; iX++)
Cx = CxMin + iX * PixelWidth;
/* initial value of orbit = critical point Z= 0 */
Zx = 0.0;
Zy = 0.0;
Zx2 = Zx * Zx;
Zy2 = Zy * Zy;
/* */
for (Iteration = 0; Iteration < IterationMax && ((Zx2 + Zy2) < ER2); Iteration++)
Zy = 2 * Zx * Zy + Cy;
Zx = Zx2 - Zy2 + Cx;
Zx2 = Zx * Zx;
Zy2 = Zy * Zy;
/* compute pixel color (24 bit = 3 bytes) */
if (Iteration == IterationMax)
{ /* interior of Mandelbrot set = black */
color[0] = 0;
color[1] = 0;
color[2] = 0;
{ /* exterior of Mandelbrot set = white */
color[0] = 255; /* Red*/
color[1] = 255; /* Green */
color[2] = 255; /* Blue */
rows[Index] = color;
return rows;
Here is function to save it to file:
void saveToFile(vector<unsigned char *> matrix, char *filename)
char *comment = (char *)"# "; /* comment should start with # */
FILE *file;
file = fopen(filename, "wb"); /* b - binary mode */
fprintf(file, "P6\n %s\n %d\n %d\n %d\n", comment, iXmax, iYmax, MaxColorComponentValue);
for (int Index = 0; Index < MaxIndex; Index++)
fwrite(matrix[Index], 1, 3, file);
Some global values and main loop:
const int iXmax = 1000;
const int iYmax = 1000;
const int MaxColorComponentValue = 255;
int const MaxIndex = (iXmax * iYmax) - 1;
int main()
clock_t start = clock();
vector<unsigned char *> image = drawMandelbrot();
clock_t stop = clock();
cout << (double(stop - start) / CLOCKS_PER_SEC) << " seconds\n";
char *filename = (char *)"new2.ppm";
return 0;
Problem is that generateMandelbrot() returns matrix like this:
image matrix
but it should be vector of elements looks like this which is actually color value:
color char
I know the problems is with color and image values types, but have any idea how it should look like.
rows[Index] = color;
Is assigning the unsigned char * in your vector to the same array every time!
In other words it's like if I sell you ten cars and deliver the keys but they are all identical keys to the same car. Wouldn't you be upset?
Change your variables to use std::array:
using Color = std::array<unsigned char, 3>;
Color color;
vector<Color> rows(MaxIndex);
Now you have a vector of triples (Colors), instead of a vector of pointers that all point at the same triple.
I am trying to compute optical flow (lucas kanade - based) on an esp32-cam.
I tried to save memory by operating on 2 small buffer of array only. I still have an error corrupt heap:
bfore allocate out conv
after allocate out conv
bfore allocate out conv
after allocate out conv
bfore allocate out conv
after allocate out conv
bfore allocate out conv
CORRUPT HEAP: multi_heap.c:432 detected at 0x3fff7114 abort() was
called at PC 0x40090a7f on core 0
Here is my code composed of 1D convolution and transpose to perform separate equivalent 2D convolution:
template<typename T>
conv(uint8_t *in, const std::vector<T> &g, const int nf) {
//int const nf = f.size();
int const ng = g.size();
int const n = nf + ng - 1;
uint8_t *f = in;
Serial.println("bfore allocate out conv");
std::vector<T> out(n, T()); // memory leak CORRUPT HEAP
Serial.println("after allocate out conv");
for(auto i(0); i < n; ++i) {
int const jmn = (i >= ng - 1)? i - (ng - 1) : 0;
int const jmx = (i < nf - 1)? i : nf - 1;
for(auto j(jmn); j <= jmx; ++j) {
out[i] += (f[j] * g[i - j]);
out.erase(out.begin(), out.begin() + ng / 2 + 1);
// Rescale to 0..255
auto max = *std::max_element(out.begin(), out.end());
auto min = *std::min_element(out.begin(), out.end());
float x;
for(auto v : out) {
x = (v - min) * 255.0 / max;
*(f++) = (uint8_t)x;
void transpose(uint8_t *f, int w, int h) {
for(auto i(0); i < h; ++i)
for(auto j(0); j < w; ++j)
std::swap(f[w * i + j], f[w * j + i]);
void LK_optical_flow(uint8_t *src1, uint8_t *src2, uint8_t *output, int w, int h)
std::vector<float> Kernel_Dy = {1, 2, 1};
std::vector<float> Kernel_Dx = {-1, 0, 1};
std::vector<float> Kernel_Dt = {1/3.0, 1/3.0, 1/3.0};
uint8_t *fx = src1;
uint8_t *fy = new uint8_t[w * h];
uint8_t *ft = src2;
memcpy(fy, fx, w * h * sizeof(uint8_t));
// Sobel Dx
conv(fx, Kernel_Dx, w*h);
transpose(fx, w, h);
conv(fx, Kernel_Dy, w*h);
transpose(fx, w, h);
// Sobel Dy
conv(fy, Kernel_Dy, w*h);
transpose(fy, w, h);
conv(fy, Kernel_Dx, w*h); // memory leak
transpose(fy, w, h);
// Dt
//conv(src2, Kernel_Dt, w*h);
Apparently the leaks come from the second buffer I allocated pointed by fy during the second call of conv(fy, ...) when it allocate out as vector.
What am I doing wrong?
With w and h not being the same, transpose will access and write to out-of-bounds memory.
From your comment, you have w at 96 and h at about 48. The second parameter to swap in transpose will access up to f[w * (w - 1) + h * (h - 1)] which is past the w * h elements you've allocated. This will change memory that hasn't been allocated, and in your case is corrupting the data your library uses to keep track of allocated memory (which is only detected during an allocation of free, and may not get detected right away).
The solution involves rewriting transpose to properly transpose a rectangular matrix. (This involves swapping w and h for the returned matrix.)
I have a 3d array containing a sphere where inside the spherical boundary the data points are one and outside the spherical boundary the data points are 0. I want to take a fftw of this array and then ifftw it back. I should end up back with the sphere.
Here is my code:
int num = 100;
int cube = pow(num, 3);
int i, j, k;
fftw_complex *out;
double *in, *fin;
/* Allocate memory*/
out = (fftw_complex *) fftw_malloc(num * num* (num/2 +1) sizeof(fftw_complex));
in = (double *) fftw_malloc(cube * sizeof(double));
fin = (double *) fftw_malloc(cube * sizeof(double));
/* Initialize fft & ifft plans */
fftw_plan plan;
fftw_plan inv_plan1 ;
plan = fftw_plan_dft_r2c_3d(num,num,num, in, out,FFTW_MEASURE);
inv_plan1 = fftw_plan_dft_c2r_3d(num, num, (num/2 +1), out, fin, FFTW_MEASURE);
int q = 0;
for (i = 0; i < num; i++)
for (j = 0; j < num; j++)
for (k = 0; k < num; k++)
in[q] = vals[i][j][k];
for (k = 0; k < cube; k++)
fin[k] =fin[k]/(cube);
When I execute this and then plot a slice through the resulting data set I get an image that contains many streaks (looks nothing like a sphere). However, if I change the dimensions of out from num * num * (num/2 +1) to num * num * num and the dimensions of inv_plan1 from num, num, (num/2 +1) to num, num, num then I get back the sphere. I am confused because from reading the fftw3 documentation for a r2c transformation if the input dimensions are n0 x n1 x n2, then the complex output should be n0 x n1 x (n2/2 + 1). Why is this not the case for the sphere?
(Also I am very new to c++, this is the first script I have written ! )
I have written an MPI code in C++ for my Raspberry Pi cluster, which generates an image of the Mandelbrot Set. What happens is on each node (excluding the master, processor 0) part of the Mandelbrot Set is calculated, resulting in each node having a 2D array of ints that indicates whether each xy point is in the set.
It appears to work well on each node individually, but when all the arrays are gathered to the master using this command:
MPI_Gather(&inside, 1, MPI_INT, insideFull, 1, MPI_INT, 0, MPI_COMM_WORLD);
it corrupts the data, and the result is an array full of garbage.
(inside is the nodes' 2D arrays of part of the set. insideFull is also a 2D array but it holds the whole set)
Why would it be doing this?
(This led to me wondering if it corrupting because the master isn't sending its array to itself (or at least I don't want it to). So part of my question also is is there an MPI_Gather variant that doesn't send anything from the root process, just collects from everything else?)
EDIT: here's the whole code. If anyone can suggest better ways of how I'm transferring the arrays, please say.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#define ImageHeight 128
#define ImageWidth 128
double MinRe = -1.9;
double MaxRe = 0.5;
double MinIm = -1.2;
double MaxIm = MinIm + (MaxRe - MinRe)*ImageHeight / ImageWidth;
double Re_factor = (MaxRe - MinRe) / (ImageWidth - 1);
double Im_factor = (MaxIm - MinIm) / (ImageHeight - 1);
unsigned n;
unsigned MaxIterations = 50;
int red;
int green;
int blue;
// MPI variables ****
int processorNumber;
int processorRank;
int main(int argc, char** argv) {
// Initialise MPI
// Get the number of procesors
MPI_Comm_size(MPI_COMM_WORLD, &processorNumber);
// Get the rank of this processor
MPI_Comm_rank(MPI_COMM_WORLD, &processorRank);
// Get the name of this processor
char processorName[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processorName, &name_len);
// A barrier just to sync all the processors, make timing more accurate
// Make an array that stores whether each point is in the Mandelbrot Set
int inside[ImageWidth / processorNumber][ImageHeight / processorNumber];
if(processorRank == 0) {
printf("Generating Mandelbrot Set\n");
// We don't want the master to process the Mandelbrot Set, only the slaves
if(processorRank != 0) {
// Determine which coordinates to test on each processor
int xMin = (ImageWidth / (processorNumber - 1)) * (processorRank - 1);
int xMax = ((ImageWidth / (processorNumber - 1)) * (processorRank - 1)) - 1;
int yMin = (ImageHeight / (processorNumber - 1)) * (processorRank - 1);
int yMax = ((ImageHeight / (processorNumber - 1)) * (processorRank - 1)) - 1;
// Check each value to see if it's in the Mandelbrot Set
for (int y = yMin; y <= yMax; y++) {
double c_im = MaxIm - y *Im_factor;
for (int x = xMin; x <= xMax; x++) {
double c_re = MinRe + x*Re_factor;
double Z_re = c_re, Z_im = c_im;
int isInside = 1;
for (n = 0; n <= MaxIterations; ++n) {
double Z_re2 = Z_re * Z_re, Z_im2 = Z_im * Z_im;
if (Z_re2 + Z_im2 > 10) {
isInside = 0;
Z_im = 2 * Z_re * Z_im + c_im;
Z_re = Z_re2 - Z_im2 + c_re;
if (isInside == 1) {
inside[x][y] = 1;
inside[x][y] = 0;
// Wait for all processors to finish computing
int insideFull[ImageWidth][ImageHeight];
if(processorRank == 0) {
printf("Sending parts of set to master\n");
// Send all the arrays to the master
MPI_Gather(&inside[0][0], 1, MPI_INT, &insideFull[0][0], 1, MPI_INT, 0, MPI_COMM_WORLD);
// Output the data to an image
if(processorRank == 0) {
printf("Generating image\n");
FILE * image = fopen("mandelbrot_set.ppm", "wb");
fprintf(image, "P6 %d %d 255\n", ImageHeight, ImageWidth);
for(int y = 0; y < ImageHeight; y++) {
for(int x = 0; x < ImageWidth; x++) {
if(insideFull[x][y]) {
putc(0, image);
putc(0, image);
putc(255, image);
else {
putc(0, image);
putc(0, image);
putc(0, image);
// Just to see what values return, no actual purpose
printf("%d, %d, %d\n", x, y, insideFull[x][y]);
// Finalise MPI
You call MPI_Gether with the following parameters:
const void* sendbuf : &inside[0][0] Starting address of send buffer
int sendcount : 1 Number of elements in send buffer
const MPI::Datatype& sendtype : MPI_INT Datatype of send buffer elements
void* recvbuf : &insideFull[0][0]
int recvcount : 1 Number of elements for any single receive
const MPI::Datatype& recvtype : MPI_INT Datatype of recvbuffer elements
int root : 0 Rank of receiving process
MPI_Comm comm : MPI_COMM_WORLD Communicator (handle).
Sending/receiving only one element is not sufficient. Instead of 1 use
(ImageWidth / processorNumber)*(ImageHeight / processorNumber)
Then think about the different memory layout of your source and target 2D arrays:
int inside[ImageWidth / processorNumber][ImageHeight / processorNumber];
int insideFull[ImageWidth][ImageHeight];
As the copy is a memory bloc copy, and not an intelligent 2D array copy, all your source integers will be transfered contiguously to the target adress, regardless of the different size of the lines.
I'd recommend to send the data fisrt into an array of the same size as the source, and then in the receiving process, to copy the elements to the right lines & columns in the full array, for example with a small function like:
// assemble2d():
// copys a source int sarr[sli][sco] to a destination int darr[dli][sli]
// using an offset to starting at darr[doffli][doffco].
// The elements that are out of bounds are ignored. Negative offset possible.
void assemble2D(int*darr, int dli, int dco, int*sarr, int sli, int sco, int doffli=0, int doffco=0)
for (int i = 0; i < sli; i++)
for (int j = 0; j < sco; j++)
if ((i + doffli >= 0) && (j + doffco>=0) && (i + doffli<dli) && (j + doffco<dco))
darr[(i+doffli)*dli + j+doffco] = sarr[i*sli+j];