lock of openmp seems not to work when dong summation

lock of openmp seems not to work when dong summation - c++

I am new to openMP and mutli-threading. I need to do some summation work and I know that when writing to the shared variable, it need to use lock like omp_lock_t. But when I do so, the result still goes wrong.
The code is:
#include <omp.h>
#include <cstdio>
struct simu
{
public:
simu() : data{ nullptr }
{
omp_init_lock(&lock);
}
~simu()
{
omp_destroy_lock(&lock);
}
void calcluate()
{
omp_set_lock(&lock);
(*data) += 1;
omp_unset_lock(&lock);
}
public:
omp_lock_t lock;
int *data;
};
int main()
{
printf("thread_num = %d\n", omp_get_num_procs());
const int size = 2000;
int a = 1;
int b = 2;
simu s[size];
simu *ps[size];
for (int i = 0; i < size; ++i)
{
s[i].data = (0 == i % 2) ? &a : &b;
ps[i] = &s[i];
}
for (int k = 0; k < size; ++k)
{
ps[k]->calcluate();
}
printf("a = %d, b = %d\n", a, b);
a = 1;
b = 2;
#pragma omp parallel for default(shared) num_threads(4)
for (int k = 0; k < size; ++k)
{
ps[k]->calcluate();
}
printf("a = %d, b = %d\n", a, b);
return 0;
}
And the result is
thread_num = 8
a = 1001, b = 1002
a = 676, b = 679
I run this code on Win10. Can anyone explain why the result is wrong?

A lock protects the actual data item from simultaneous writes. Your lock is in the object that points at the item, so this is pointless. You need to let you data point to an object that contains a lock.

Related

How to nest for loops in CUDA?

I would like to ask for a complete example of CUDA code, one that includes everything someone may want to include so that it may be referenced by people trying to write such code such as myself.
My main concerns are whether or not it is possible to process multiple for loops at the same time on different threads in the same block. This is the difference between running (for a clear example) a total of 2016 threads divided into blocks of 32 on case 3 in the example code and running 1024 threads on each for loop theoretically with the code we have we could run even fewer taking of another 2 blocks by running the for loops of other cases under the same block. Otherwise separate cases would primarily be used for processing separate tasks such as a for loop. Currently it appears that the CUDA code simply knows when to run in parallel.
// note: rarely referenced, you can process if statements in parallel seemingly by block, I'd say that is the primary purpose of using more blocks instead of increasing thread count per block during call, other than the need of multiple SMs (Streaming Multiprocessors), capped at 2048 threads (also the cap for a block)//
If we have the following code including for loops and if statements then what would the code that optimizes parallelization be?
public void main(string[] args) {
doMath(3); // we want to process each statement in parallel. For this we use different blocks.
}
void doMath(int question) {
int[] x = new int{0,1,2,3,4,5,6,7,8,9};
int[] y = new int{0,1,2,3,4,5,6,7,8,10};
int[] z = new int{0,1,2,3,4,5,6,7,8,11};
int[] w = new int{0,1,2,3,4,5,6,7,8,12};
int[] q = new int[1000];
int[] r = new int[1000];
int[] v = new int[1000];
int[] t = new int[1000];
switch(question) {
case 1:
for (int a = 0; a < x.length; a++) {
for (int b = 0; b < y.length; b++) {
for (int c = 0; c < z.length; c++) {
q[(a*100)+(b*10)+(c)] = x[a] + y[b] + z[c];
}
}
}
break;
case 2:
for (int a = 0; a < x.length; a++) {
for (int b = 0; b < y.length; b++) {
for (int c = 0; c < w.length; c++) {
r[(a*100)+(b*10)+(c)] = x[a] + y[b] + w[c];
}
}
}
break;
case 3:
for (int a = 0; a < x.length; a++) {
for (int b = 0; b < z.length; b++) {
for (int c = 0; c < w.length; c++) {
v[(a*100)+(b*10)+(c)] = x[a] + z[b] + w[c];
}
}
}
for (int a = 0; a < x.length; a++) {
for (int b = 0; b < y.length; b++) {
for (int c = 0; c < w.length; c++) {
t[(a*100)+(b*10)+(c)] = x[a] + y[b] + w[c];
}
}
}
break;
}
}
From the samples I have seen the CUDA code would be as follows:
// 3 blocks for 3 switch cases the third case requires 2000 threads to be done in perfect parallel while the first two only require 1000. blocks operate by multiples of 32 (threads). the trick is to take the greatest common denominator of all cases, or if/else statements as the... case... may be, and appropriate the number of blocks required to each case. (in this example we would need 127 blocks of 32 threads (1024 * 2 + 2048 - 32)//
//side note: each Streaming Multiprocessor or SM can only support 2048 threads and 2048 / (# of blocks * # of threads/block)//
public void main(string[] args) {
int *x, *y *z, *w, *q, *r, *t;
int[] x = new int{0,1,2,3,4,5,6,7,8,9};
int[] y = new int{0,1,2,3,4,5,6,7,8,10};
int[] z = new int{0,1,2,3,4,5,6,7,8,11};
int[] w = new int{0,1,2,3,4,5,6,7,8,12};
int[] q = new int[1000];
int[] r = new int[1000];
int[] t = new int[1000];
cudaMallocManaged(&x, x.length*sizeof(int));
cudaMallocManaged(&y, y.length*sizeof(int));
cudaMallocManaged(&z, z.length*sizeof(int));
cudaMallocManaged(&w, w.length*sizeof(int));
cudaMallocManaged(&q, q.length*sizeof(int));
cudaMallocManaged(&r, r.length*sizeof(int));
cudaMallocManaged(&t, t.length*sizeof(int));
doMath<<<127,32>>>(x, y, z, w, q, r, t);
cudaDeviceSynchronize();
cudaFree(x);
cudaFree(y);
cudaFree(z);
cudaFree(w);
cudaFree(q);
cudaFree(r);
cudaFree(t);
}
__global__
void doMath(int *x, int *y, int *z, int *w, int *q, int *r, int *t) {
int index = blockIdx.x * blockDim.x + threadIdx.x;
int stride = blockDim.x * gridDim.x;
switch(question) {
case 1:
for (int a = index; a < x.length; a+=stride ) {
for (int b = index; b < y.length; b+=stride) {
for (int c = index; c < z.length; c+=stride) {
q[(a*100)+(b*10)+(c)] = x[a] + y[b] + w[c];
}
}
}
break;
case 2:
for (int a = index; a < x.length; a+=stride) {
for (int b = index; b < y.length; b+=stride) {
for (int c = index; c < w.length; c+=stride) {
r[(a*100)+(b*10)+(c)] = x[a] + y[b] + w[c];
}
}
}
break;
case 3:
for (int a = index; a < x.length; a+=stride) {
for (int b = index; b < y.length; b+=stride) {
for (int c = index; c < z.length; c+=stride) {
q[(a*100)+(b*10)+(c)] = x[a] + y[b] + w[c];
}
}
}
for (int a = index; a < x.length; a+=stride) {
for (int b = index; b < y.length; b+=stride) {
for (int c = index; c < w.length; c+=stride) {
t[(a*100)+(b*10)+(c)] = x[a] + y[b] + w[c];
}
}
}
break;
}
}

In cuda every thread runs your kernel. If you want the threads to do different things you have to branch dependent (in someway) on threadIdx and/or blockIdx.
You did this by calculating index. Every thread in you kernel has a different index. Now you have to map your indices to the work the kernel should do. So you have to map every index to one or multiple triplets of (a,b,c).
Your current mapping is something like:
index -> (index+i*stride,index+j*stride,index+k*stride)
I do not believe this was your intent.

How can I parallel this loop with open mp?

I don't know how I can parallel this loops because I have a lot of dependent variables and I am very confused
can you help and guide me?
the number one is :
for (int a = 0; a < sigmaLen; ++a) {
int f = freq[a];
if (f >= sumFreqLB)
if (updateRemainingDistances(s, a, pos))
if (prunePassed(pos + 1)) {
lmer[pos] = a;
enumerateStrings(pos + 1, sumFreqLB - f);
}
}
The second one is :
void preprocessLowerBounds() {
int i = stackSz - 1;
int pairOffset = (i * (i - 1)) >> 1;
for (int k = L; k; --k) {
int *dsn = dist[k] + pairOffset;
int *ds = dist[k - 1] + pairOffset;
int *s = colS[k - 1];
char ci = s[i];
for (int j = 0; j < i; ++j) {
char cj = s[j];
*ds++ = (*dsn++) + (ci != cj);
}
}
Really another one is :
void enumerateSubStrings(int rowNumber, int remainQTolerance) {
int nItems = rowSize[rowNumber][stackSz];
if (shouldGenerateNeighborhood(rowNumber, nItems)) {
bruteForceIt(rowNumber, nItems);
} else {
indexType *row = rowItem[rowNumber];
for (int j = 0; j < nItems; ++j) {
indexType ind = row[j];
addString(lmers + ind);
preprocessLowerBounds();
uint threshold = maxLB[stackSz] - addMaxFreq();
if (hasSolution(0, threshold)) {
if (getValid<hasPreprocessedPairs, useQ>(rowNumber + 1,
(stackSz <= 2 ? n : smallN), threshold + LminusD,
ind, remainQTolerance)) {
enumerateSubStrings<hasPreprocessedPairs, useQ>(
rowNumber + 1, remainQTolerance);
}
}
removeLastString();
}
}
void addString(const char *t) {
int *mf = colMf[stackSz + 1];
for (int j = 0; j < L; ++j) {
int c = t[j];
colS[j][stackSz] = c;
mf[j] = colMaxFreq[j] + (colMaxFreq[j] == colFreq[j][c]++);
}
colMaxFreq = mf;
++stackSz;
}
void preprocessLowerBounds() {
int i = stackSz - 1;
int pairOffset = (i * (i - 1)) >> 1;
for (int k = L; k; --k) {
int *dsn = dist[k] + pairOffset;
int *ds = dist[k - 1] + pairOffset;
int *s = colS[k - 1];
char ci = s[i];
for (int j = 0; j < i; ++j) {
char cj = s[j];
*ds++ = (*dsn++) + (ci != cj);
}
}
}
void removeLastString() {
--stackSz;
for (int j = 0; j < L; ++j)
--colFreq[j][colS[j][stackSz]];
colMaxFreq = colMf[stackSz];
}

Ok, For OpenMP to parallelize a loop in your basically follow these two rules, the first never write in the same memory location from different threads and second rule never depend on the reading of a memory area that may modified another thread, Now in the first loop you just change the lmer variable and other operations are read-only variables that I assume are not changing at the same time from another part of your code, so the first loop would be as follows:
#pragma omp for private(s,a,pos) //According to my intuition these variables are global or belong to a class, so you must convert private to each thread, on the other hand sumFreqLB and freq not included because only these reading
for (int a = 0; a < sigmaLen; ++a) {
int f = freq[a];
if (f >= sumFreqLB)
if (updateRemainingDistances(s, a, pos))
if (prunePassed(pos + 1)) {
#pragma omp critical //Only one thread at a time can enter otherwise you will fail at runtime
{
lmer[pos] = a;
}
enumerateStrings(pos + 1, sumFreqLB - f);
}
}
In the second loop i could not understand how you're using the for, but you have no problems because you use only reads and only modified the thread local variables.
You must make sure that the functions updateRemainingDistances, prunePassed and enumerateStrings do not use static or global variables within.
In the following function you use most only read operations which can be done from multiple threads (if any thread modifying these variables) and write in local memory positions so just change the shape of the FOR for OpenMP can recognize that FOR.
void preprocessLowerBounds() {
int i = stackSz - 1;
int pairOffset = (i * (i - 1)) >> 1;
#pragma omp for
for (int var=0; var<=k-L; var++){
int newK=k-var;//This will cover the initial range and in the same order
int *dsn = dist[newK] + pairOffset;
int *ds = dist[newK - 1] + pairOffset;
int *s = colS[newK - 1];
char ci = s[i];
for (int j = 0; j < i; ++j) {
char cj = s[j];
*ds++ = (*dsn++) + (ci != cj);
}
}
In the last function you use many functions for which I do not know the source code and thus can not know if they are looking for parallelizable example below the following examples are wrong:
std::vector myVector;
void notParalelizable_1(int i){
miVector.push_back(i);
}
void notParalelizable_2(int i){
static int A=0;
A=A+i;
}
int varGlobal=0;
void notParalelizable_3(int i){
varGlobal=varGlobal+i;
}
void oneFunctionParalelizable(int i)
{
int B=i;
}
int main()
{
#pragma omp for
for(int i=0;i<10;i++)
{
notParalelizable_1(i);//Error because myVector is modified simultaneously from multiple threads, The error here is that myVector not store the values in ascending order as this necessarily being accesing by multiple threads, this more complex functions can generate erroneous results or even errors in run time.
}
#pragma omp for
for(int i=0;i<10;i++)
{
notParalelizable_2(i);//Error because A is modified simultaneously from multiple threads
}
#pragma omp for
for(int i=0;i<10;i++)
{
notParalelizable_3(i);//Error because varGlobal is modified simultaneously from multiple threads
}
#pragma omp for
for(int i=0;i<10;i++)
{
oneFunctionParalelizable(i);//no problem
}
//The following code is correct
int *vector=new int[10];
#pragma omp for
for(int i=0;i<10;i++)
{
vector[i]=i;//No problem because each thread writes to a different memory pocicion
}
//The following code is wrong
int k=2;
#pragma omp for
for(int i=0;i<10;i++)
{
k=k+i; //The result of the k variable at the end will be wrong as it is modified from different threads
}
return 0;
}

Boost returning values from multithread vector

I am trying to develop a code which generates N threads into a loop. Each thread generates 40 random numbers and pick from them the highest. Afterwards, I have to choose the highest number from all. However, when I return the highest value of each thread (b) it is empty, this is the code I am using:
class rdm_thr
{
public:
rdm_thr()
{
}
void rdmgen()
{
default_random_engine generator;
double rdm;
b=0;
normal_distribution<double> normal(0, 1);
for(int i=0; i<40; i++)
{
rdm = normal(generator);
if(rdm>b)
b = rdm;
}
}
};
void main()
{
vector<boost::thread *> z;
vector<rdm_thr> o;
boost::function<void()> th_func;
for (int i = 0; i < 2; i++)
o.push_back(rdm_thr());
for (int i = 0; i < 2; i++)
{
th_func = boost::bind(&rdm_thr::rdmgen, &o[i]);
boost::thread thr(th_func);
z.push_back(&thr);
}
for (int i = 0; i < 2; i++)
{
z[i]->join();
}
}
Is there another way to do it?

You could change your class logic as such:
class rdm_thr
{
public:
rdm_thr() {}
void rdmgen()
{
...
}
void join() { t.join(); }
void start()
{
t = boost::thread(boost::bind(&rdm_thr::rdmgen, this));
}
private:
boost::thread t;
// could also be pointer type and 'new/delete' would have to be used in that event
};
#define TSZ 2
void main()
{
std::vector<rdm_thr*> o;
int i = 0;
for (; i < TSZ; i++) {
o.push_back(new rdm_thr());
o.back()->start();
}
for (i = 0; i < TSZ; i++) {
o[i]->join();
delete o[i]; //clean up
}
}
And if you didn't want to change your class logic, you could do the following in your main function:
#define TSZ 2
void main()
{
std::vector<boost::thread *> z;
std::vector<rdm_thr *> o;
int i = 0;
for (; i < TSZ; i++) {
o.push_back(new rdm_thr());
z.push_back(new boost::thread(boost::bind(&rdm_thr::rdmgen, o.back())));
}
for (i = 0; i < TSZ; i++) {
z[i]->join();
delete z[i];
delete o[i];
}
}
I don't have access to a compiler right now so I can't verify 100%, but as your asking more on theory, the above code is to help illustrate alternative ways of achieving similar results.
I hope that can help

Unhandled exception with C++ class function

I am writing a program which will preform texture synthesis. I have been away from C++ for a while and am having trouble figuring out what I am doing wrong in my class. When I run the program, I get an unhandled exception in the copyToSample function when it tries to access the arrays. It is being called from the bestSampleSearch function when the unhandled exception occurs. The function has been called before and works just fine, but later on in the program it is called a second time and fails. Any ideas? Let me know if anyone needs to see more code. Thanks!
Edit1: Added the bestSampleSearch function and the compareMetaPic function
Edit2: Added a copy constructor
Edit3: Added main()
Edit4: I have gotten the program to work. However there is now a memory leak of some kind or I am running out of memory when I run the program. It seems in the double for loop in main which starts "// while output picture is unfilled" is the problem. If I comment this portion out the program finishes in a timely manner but only one small square is output. Something must be wrong with my bestSampleSearch function.
MetaPic.h
#pragma once
#include <pic.h>
#include <stdlib.h>
#include <cmath>
class MetaPic
{
public:
Pic* source;
Pixel1*** meta;
int x;
int y;
int z;
MetaPic();
MetaPic(Pic*);
MetaPic(const MetaPic&);
MetaPic& operator=(const MetaPic&);
~MetaPic();
void allocateMetaPic();
void copyPixelData();
void copyToOutput(Pic*&);
void copyToMetaOutput(MetaPic&, int, int);
void copyToSample(MetaPic&, int, int);
void freeMetaPic();
};
MetaPic.cpp
#include "MetaPic.h"
MetaPic::MetaPic()
{
source = NULL;
meta = NULL;
x = 0;
y = 0;
z = 0;
}
MetaPic::MetaPic(Pic* pic)
{
source = pic;
x = pic->nx;
y = pic->ny;
z = pic->bpp;
allocateMetaPic();
copyPixelData();
}
MetaPic::MetaPic(const MetaPic& mp)
{
source = mp.source;
x = mp.x;
y = mp.y;
z = mp.z;
allocateMetaPic();
copyPixelData();
}
MetaPic::~MetaPic()
{
freeMetaPic();
}
// create a 3 dimensional array from the original one dimensional array
void MetaPic::allocateMetaPic()
{
meta = (Pixel1***)calloc(x, sizeof(Pixel1**));
for(int i = 0; i < x; i++)
{
meta[i] = (Pixel1**)calloc(y, sizeof(Pixel1*));
for(int j = 0; j < y; j++)
{
meta[i][j] = (Pixel1*)calloc(z, sizeof(Pixel1));
}
}
}
void MetaPic::copyPixelData()
{
for(int j = 0; j < y; j++)
{
for(int i = 0; i < x; i++)
{
for(int k = 0; k < z; k++)
meta[i][j][k] = source->pix[(j*z*x)+(i*z)+k];
}
}
}
void MetaPic::copyToOutput(Pic* &output)
{
for(int j = 0; j < y; j++)
{
for(int i = 0; i < x; i++)
{
for(int k = 0; k < z; k++)
output->pix[(j*z*x)+(i*z)+k] = meta[i][j][k];
}
}
}
// copy the meta data to the final pic output starting at the top left of the picture and mapped to 'a' and 'b' coordinates in the output
void MetaPic::copyToMetaOutput(MetaPic &output, int a, int b)
{
for(int j = 0; (j < y) && ((j+b) < output.y); j++)
{
for(int i = 0; (i < x) && ((i+a) < output.x); i++)
{
for(int k = 0; k < z; k++)
output.meta[i+a][j+b][k] = meta[i][j][k];
}
}
}
// copies from a source image to a smaller sample image
// *** Must make sure that the x and y coordinates have enough buffer space ***
void MetaPic::copyToSample(MetaPic &sample, int a, int b)
{
for(int j = 0; (j < sample.y) && ((b+j) < y); j++)
{
for(int i = 0; i < (sample.x) && ((a+i) < x); i++)
{
for(int k = 0; k < sample.z; k++)
{
**sample.meta[i][j][k] = meta[i+a][j+b][k];**
}
}
}
}
// free the meta pic data (MetaPic.meta)
// *** Not to be used outside of class declaration ***
void MetaPic::freeMetaPic()
{
for(int j = 0; j < y; j++)
{
for(int i = 0; i < z; i++)
free(meta[i][j]);
}
for(int i = 0; i < x; i++)
free(meta[i]);
free(meta);
}
MetaPic MetaPic::operator=(MetaPic mp)
{
MetaPic newMP(mp.source);
return newMP;
}
main.cpp
#ifdef WIN32
// For VC++ you need to include this file as glut.h and gl.h refer to it
#include <windows.h>
// disable the warning for the use of strdup and friends
#pragma warning(disable:4996)
#endif
#include <stdio.h> // Standard Header For Most Programs
#include <stdlib.h> // Additional standard Functions (exit() for example)
#include <iostream>
// Interface to libpicio, provides functions to load/save jpeg files
#include <pic.h>
#include <string.h>
#include <time.h>
#include <cmath>
#include "MetaPic.h"
using namespace std;
MetaPic bestSampleSearch(MetaPic, MetaPic);
double compareMetaPics(MetaPic, MetaPic);
#define SAMPLE_SIZE 23
#define OVERLAP 9
// Texture source image (pic.h uses the Pic* data structure)
Pic *sourceImage;
Pic *outputImage;
int main(int argc, char* argv[])
{
char* pictureName = "reg1.jpg";
int outputWidth = 0;
int outputHeight = 0;
// attempt to read in the file name
sourceImage = pic_read(pictureName, NULL);
if(sourceImage == NULL)
{
cout << "Couldn't read the file" << endl;
system("pause");
exit(EXIT_FAILURE);
}
// *** For now set the output image to 3 times the original height and width ***
outputWidth = sourceImage->nx*3;
outputHeight = sourceImage->ny*3;
// allocate the output image
outputImage = pic_alloc(outputWidth, outputHeight, sourceImage->bpp, NULL);
Pic* currentImage = pic_alloc(SAMPLE_SIZE, SAMPLE_SIZE, sourceImage->bpp, NULL);
MetaPic metaSource(sourceImage);
MetaPic metaOutput(outputImage);
MetaPic metaCurrent(currentImage);
// seed the output image
int x = 0;
int y = 0;
int xupperbound = metaSource.x - SAMPLE_SIZE;
int yupperbound = metaSource.y - SAMPLE_SIZE;
int xlowerbound = 0;
int ylowerbound = 0;
// find random coordinates
srand(time(NULL));
while((x >= xupperbound) || (x <= xlowerbound))
x = rand() % metaSource.x;
while((y >= yupperbound) || (y <= ylowerbound))
y = rand() % metaSource.y;
// copy a random sample from the source to the metasample
metaSource.copyToSample(metaCurrent, x, y);
// copy the seed to the metaoutput
metaCurrent.copyToMetaOutput(metaOutput, 0, 0);
int currentOutputX = 0;
int currentOutputY = 0;
// while the output picture is unfilled...
for(int j = 0; j < yupperbound; j+=(SAMPLE_SIZE-OVERLAP))
{
for(int i = 0; i < xupperbound; i+=(SAMPLE_SIZE-OVERLAP))
{
// move the sample to correct overlap
metaSource.copyToSample(metaCurrent, i, j);
// find the best match for the sample
metaCurrent = bestSampleSearch(metaSource, metaCurrent);
// write the best match to the metaoutput
metaCurrent.copyToMetaOutput(metaOutput, i, j);
// update the values
}
}
// copy the metaOutput to the output
metaOutput.copyToOutput(outputImage);
// output the image
pic_write("reg1_output.jpg", outputImage, PIC_JPEG_FILE);
// clean up
pic_free(sourceImage);
pic_free(outputImage);
pic_free(currentImage);
// return success
cout << "Done!" << endl;
system("pause");
// return success
return 0;
}
// finds the best sample to insert into the image
// *** best must be the sample which consists of the overlap ***
MetaPic bestSampleSearch(MetaPic source, MetaPic best)
{
MetaPic metaSample(best);
double bestScore = 999999.0;
double currentScore = 0.0;
for(int j = 0; j < source.y; j++)
{
for(int i = 0; i < source.x; i++)
{
// copy the image starting at the top left of the source image
source.copyToSample(metaSample, i, j);
// compare the sample with the overlap
currentScore = compareMetaPics(best, metaSample);
// if best score is greater than current score then copy the better sample to best and continue searching
if( bestScore > currentScore)
{
metaSample.copyToSample(best, 0, 0);
bestScore = currentScore;
}
// otherwise, the score is less than current score then do nothing (a better sample has not been found)
}
}
return best;
}
// find the comparison score for the two MetaPics based on their rgb values
// *** Both of the meta pics should be the same size ***
double compareMetaPics(MetaPic pic1, MetaPic pic2)
{
float r1 = 0.0;
float g1 = 0.0;
float b1 = 0.0;
float r2 = 0.0;
float g2 = 0.0;
float b2 = 0.0;
float r = 0.0;
float g = 0.0;
float b = 0.0;
float sum = 0.0;
// take the sum of the (sqrt((r1-r2)^2 + ((g1-g2)^2 + ((b1-b2)^2))
for(int j = 0; (j < pic1.y) && (j < pic2.y); j++)
{
for(int i = 0; (i < pic1.x) && (i < pic2.x); i++)
{
r1 = PIC_PIXEL(pic1.source, i, j, 0);
r2 = PIC_PIXEL(pic2.source, i, j, 0);
g1 = PIC_PIXEL(pic1.source, i, j, 1);
g2 = PIC_PIXEL(pic2.source, i, j, 1);
b1 = PIC_PIXEL(pic1.source, i, j, 2);
b2 = PIC_PIXEL(pic2.source, i, j, 2);
r = r1 - r2;
g = g1 - g2;
b = b1 - b2;
sum += sqrt((r*r) + (g*g) + (b*b));
}
}
return sum;
}

I'm not sure if this is the root cause of the problem, but your assignment operator does not actually assign anything:
MetaPic MetaPic::operator=(MetaPic mp)
{
MetaPic newMP(mp.source);
return newMP;
}
This should probably look something like the following (based off of the code in your copy constructor):
edit: with credit to Alf P. Steinbach
MetaPic& MetaPic::operator=(MetaPic mp)
{
mp.swap(*this);
return *this;
}

It turns out that the deallocate function is incorrect. It should be freeing in the same manner that it was allocating.
void MetaPic::freeMetaPic()
{
for(int j = 0; j < y; j++)
{
for(int i = 0; i < z; i++)
free(meta[i][j]);
}
for(int i = 0; i < x; i++)
free(meta[i]);
free(meta);
}

Segmentation fault In the Cascaded Struct Pointers Test Code

The following dummy test code gives segmentation fault at the end of execution (to be more specific in main at return 0). I wondered the reason of this behavior. Would it be because it couldn't free the dummy variable? I'm using g++ 4.4 with no optimization flags for the tests.
#include <vector>
#include <boost/multi_array.hpp>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
using std::vector;
typedef boost::multi_array<float, 1> DVec;
class Point{
public:
int x, y;
double *dist;
DVec dir;
};
struct another_struct {
vector <Point *>c;
};
struct in_foo{
vector <another_struct *>aVec;
char *aname;
float b;
};
struct foo {
DVec b;
vector<in_foo *> mVec;
};
int main(){
DVec c(boost::extents[4]);
foo **dummy = (foo **) calloc(4, sizeof(*dummy));
vector <in_foo *>test_var(5);
for(int i =0; i < 6; i++){
test_var[i] = (in_foo *) malloc(sizeof(in_foo));
memset(test_var[i], 0, sizeof(*test_var[i]));
test_var[i]->aname = "42!\n";
test_var[i]->b = (float) i;
}
for (int i = 0 ; i < 4; i++) {
dummy[i] = (foo *) malloc(sizeof(*dummy[i]));
(dummy[i]->b).resize(boost::extents[2]);
(dummy[i]->mVec) = test_var;
}
for (int i = 0 ; i < 4; i++) {
for(int j = 0; j < 5; j++){
(dummy[i]->mVec[j]->aVec).resize(5);
for (int n = 0; n < 6; n++) {
dummy[i]->mVec[j]->aVec[n] = new another_struct();
(dummy[i]->mVec[j]->aVec[n])->c.resize(3);
for (int m = 0; m < 4; m++) {
(dummy[i]->mVec[j]->aVec[n]->c[m]) = new Point();
(dummy[i]->mVec[j]->aVec[n]->c[m])->x = 100 * n;
(dummy[i]->mVec[j]->aVec[n]->c[m])->y = 11000 * m;
(dummy[i]->mVec[j]->aVec[n]->c[m])->dist = new double[2];
(dummy[i]->mVec[j]->aVec[n]->c[m])->dist[0] = 11200.123;
(dummy[i]->mVec[j]->aVec[n]->c[m])->dist[1] = 66503.131;
printf("x: %d, y: %d, dist 0: %f, dist 1: %f \n", (dummy[i]->mVec[j]->aVec[n]->c[m])->x, (dummy[i]->mVec[j]->aVec[n]->c[m])->y, (dummy[i]->mVec[j]->aVec[n]->c[m])->dist[0], (dummy[i]->mVec[j]->aVec[n]->c[m])->dist[1]);
}
}
printf("b: %f aname: %s \n", dummy[i]->mVec[j]->b, dummy[i]->mVec[j]->aname);
}
}
if (NULL != dummy) {
for(int i = 0; i < 4; i++)
{
free(dummy[i]);
}
free(dummy);
}
return 0;
}

You can't use malloc or calloc to allocate memory for a class or struct that is non-POD, for example vector, foo, in_foo. Once you do that all bets are off and any behavior your program displays is within reason.
Use new with smart pointers or better yet use composition if possible.pointers with new.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

lock of openmp seems not to work when dong summation - c++

A lock protects the actual data item from simultaneous writes. Your lock is in the object that points at the item, so this is pointless. You need to let you data point to an object that contains a lock.

Related

How to nest for loops in CUDA?

How can I parallel this loop with open mp?

Boost returning values from multithread vector

Unhandled exception with C++ class function

Segmentation fault In the Cascaded Struct Pointers Test Code

Categories

Resources