I wrote a code to calculate moving L2 norm of two arrays.
func_lstl2(const int &nx, const float x[],const int &ny, const float y[], int &shift, double &lstl2)
{
int maxshift = 200;
int len_z = maxshift * 2;
int len_work = len_z + ny;
//initialize array work and array z
double *z = new double[len_z]; float *work = new float[len_work];
for (int i = 0; i < len_z; i++)
z[i] = 0;
for (int i = 0; i < len_work; i++)
work[i] = 0;
for (int i = 0; i < ny; i++)
work[i + maxshift] = y[i];
// do moving least square residue calculation
float temp;
for (int i = 0; i < len_z; i++)
{
for (int j = 0; j < nx; j++)
{
temp = x[j] - work[i + j];
z[i] += temp * temp;
}
}
// find the best fit value
lstl2 = 1E30;
shift = 0;
for (int i = 0; i < len_z; i++)
{
if (z[i] < lstl2)
{
lstl2 = z[i];
shift = i - maxshift;
}
}
//end of program
delete[] z;
delete[] work;
}
I tested two arrays with exactly same length and same scale.
int shift; double lstl2;
func_lstl2(2000,z1,2000,z2,shift,lstl2) ;
func_lstl2(2000,x1,2000,x2,shift,lstl2) ;
For z array, it used 0.0032346 seconds, for x array, it used 0.0140903 seconds. I cannot figure out why there is near 5 times time consumption difference. Could you help me figure it out? Thank you very much!
Here is the link for z array and x array.
https://drive.google.com/file/d/1aONKTjE_7NI1bp8YkDL2CMfg9C5h67Fe/view?usp=sharing
I strongly suspect you're dealing with denormalized floating point calculation effects. Using your existing function, loading the values as-appropriate in vectors, and turning them loose seven times on the provided input, (compiled with -O3 optimization)
for (int i = 0; i < 5; ++i)
{
int shift = 0;
double lstl2 = 0;
auto tp0 = steady_clock::now();
func_lstl2(2000, v1.data(), 2000, v2.data(), shift, lstl2);
auto tp1 = steady_clock::now();
std::cout << pr[0] << ',' << pr[1] << ':';
std::cout << duration_cast<milliseconds>(tp1 - tp0).count() << "ms\n";
}
I receive the following output, confirming your conundrum:
x1.txt,x2.txt:23ms
x1.txt,x2.txt:19ms
x1.txt,x2.txt:21ms
x1.txt,x2.txt:21ms
x1.txt,x2.txt:19ms
x1.txt,x2.txt:22ms
x1.txt,x2.txt:21ms
z1.txt,z2.txt:8ms
z1.txt,z2.txt:9ms
z1.txt,z2.txt:5ms
z1.txt,z2.txt:5ms
z1.txt,z2.txt:6ms
z1.txt,z2.txt:5ms
z1.txt,z2.txt:5ms
However, enabling denormalize-as-zero (DAZ) and flush-to-zero (FTZ) for floating calculations (the mechanism for doing so is toolchain-dependent; below is clang 13.01 on macOS):
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
delivers the following:
x1.txt,x2.txt:4ms
x1.txt,x2.txt:4ms
x1.txt,x2.txt:3ms
x1.txt,x2.txt:5ms
x1.txt,x2.txt:3ms
x1.txt,x2.txt:3ms
x1.txt,x2.txt:5ms
z1.txt,z2.txt:7ms
z1.txt,z2.txt:6ms
z1.txt,z2.txt:4ms
z1.txt,z2.txt:3ms
z1.txt,z2.txt:3ms
z1.txt,z2.txt:4ms
z1.txt,z2.txt:3ms
Your x-data set is sensitive to this; z does not appear to be. See this question for a better explanation.
Related
I tried to generate two random integers j and k using the OMNET++ function intrand() but the first variable j takes a number out of the range. I had the same problem on OMNeT++6.0pre11 (installed on Ubuntu 18.04) and I thought that it was related to the software misbehavior. Then, I upgraded my OS to Ubuntu 20.04 and installed the OMNeT++6.0pre15 but I'm still experiencing the same issue.
Below are my piece of code and the output:
In my .h
public:
struct clusteringStruct {
int CH_Index;
map<int,double> Members;
double FitnessValue;
int trial = 0;
double proba;
double X[];
};
void generateNewSol(int n,clusteringStruct Cluster[],int z);
In my .cc
int D = 2;
clusteringStruct Cluster[5];
// Initialization...
for(t = 0; t < 5; ++t) {
Cluster[t].X[0] = 0.5; // CH Residual Energy
Cluster[t].X[1] = 500*sqrt(2); // Max distance
}
// Execution...
for(int i = 0; i < 5; i++) {
generateNewSol(i,Cluster,D);
}
void generateNewSol(int i,clusteringStruct Cluster[],int D) {
// Randomly select the variable j that is to be changed
int j = intrand(D);
// Randomly select the neighbour k and ensure that he is different from i
int k = intrand(5);
while(k == i) {
k = intrand(5);
}
clusteringStruct sol = Cluster[i];
for(int q = 0; q < D; q++) {
sol.X[q] = Cluster[i].X[q];
}
EV_DEBUG <<" i = "<<i<<" || j = "<<j<<" || k = "<<k<<endl;
}
The Output of j and k variable is per the image below :
Thanks in advance for your help.
So recently I ran into a problem that I thought was interesting and I couldn't fully explain. I've highlighted the nature of the problem in the following code:
#include <cstring>
#include <chrono>
#include <iostream>
#define NLOOPS 10
void doWorkFast(int total, int *write, int *read)
{
for (int j = 0; j < NLOOPS; j++) {
for (int i = 0; i < total; i++) {
write[i] = read[i] + i;
}
}
}
void doWorkSlow(int total, int *write, int *read, int innerLoopSize)
{
for (int i = 0; i < NLOOPS; i++) {
for (int j = 0; j < total/innerLoopSize; j++) {
for (int k = 0; k < innerLoopSize; k++) {
write[j*k + k] = read[j*k + k] + j*k + k;
}
}
}
}
int main(int argc, char *argv[])
{
int n = 1000000000;
int *heapMemoryWrite = new int[n];
int *heapMemoryRead = new int[n];
for (int i = 0; i < n; i++)
{
heapMemoryRead[i] = 1;
}
std::memset(heapMemoryWrite, 0, n * sizeof(int));
auto start1 = std::chrono::high_resolution_clock::now();
doWorkFast(n,heapMemoryWrite, heapMemoryRead);
auto finish1 = std::chrono::high_resolution_clock::now();
auto duration1 = std::chrono::duration_cast<std::chrono::microseconds>(finish1 - start1);
for (int i = 0; i < n; i++)
{
heapMemoryRead[i] = 1;
}
std::memset(heapMemoryWrite, 0, n * sizeof(int));
auto start2 = std::chrono::high_resolution_clock::now();
doWorkSlow(n,heapMemoryWrite, heapMemoryRead, 10);
auto finish2 = std::chrono::high_resolution_clock::now();
auto duration2 = std::chrono::duration_cast<std::chrono::microseconds>(finish2 - start2);
std::cout << "Small inner loop:" << duration1.count() << " microseconds.\n" <<
"Large inner loop:" << duration2.count() << " microseconds." << std::endl;
delete[] heapMemoryWrite;
delete[] heapMemoryRead;
}
Looking at the two doWork* functions, for every iteration, we are reading the same addresses adding the same value and writing to the same addresses. I understand that in the doWorkSlow implementation, we are doing one or two more operations to resolve j*k + k, however, I think it's reasonably safe to assume that relative to the time it takes to do the load/stores for memory read and write, the time contribution of these operations is negligible.
Nevertheless, doWorkSlow takes about twice as long (46.8s) compared to doWorkFast (25.5s) on my i7-3700 using g++ --version 7.5.0. While things like cache prefetching and branch prediction come to mind, I don't have a great explanation as to why doWorkFast is much faster than doWorkSlow. Does anyone have insight?
Thanks
Looking at the two doWork* functions, for every iteration, we are reading the same addresses adding the same value and writing to the same addresses.
This is not true!
In doWorkFast, you index each integer incrementally, as array[i].
array[0]
array[1]
array[2]
array[3]
In doWorkSlow, you index each integer as array[j*k + k], which jumps around and repeats.
When j is 10, for example, and you iterate k from 0 onwards, you are accessing
array[0] // 10*0+0
array[11] // 10*1+1
array[22] // 10*2+2
array[33] // 10*3+3
This will prevent your optimizer from using instructions that can operate on many adjacent integers at once.
My implementation of the Durand-Kerner-Method (https://en.wikipedia.org/wiki/Durand%E2%80%93Kerner_method) does not seem to work. I believe (see following code) that I am not calculating new approximation correctly in the algorithm part itself. I cannot seem to be able to fix the problem. Very grateful for any advice.
#include <complex>
#include <cmath>
#include <vector>
#include <iostream>
#include "DurandKernerWeierstrass.h"
using namespace std;
using Complex = complex<double>;
using vec = vector<Complex>;
using Matrix = vector<vector<Complex>>;
//PRE: Recieves input value of polynomial, degree and coefficients
//POST: Outputs y(x) value
Complex Polynomial(vec Z, int n, Complex x) {
Complex y = pow(x, n);
for (int i = 0; i < n; i++){
y += Z[i] * pow(x, (n - i - 1));
}
return y;
}
/*PRE: Takes a test value, degree of polynomial, vector of coefficients and the desired
precision of polynomial roots to calculate the roots*/
//POST: Outputs the roots of Polynomial
Matrix roots(vec Z, int n, int iterations, const double precision) {
Complex z = Complex(0.4, 0.9);
Matrix P(iterations, vec(n, 0));
Complex w;
//Creating Matrix with initial starting values
for (int i = 0; i < n; i++) {
P[0][i] = pow(z, i);
}
//Durand Kerner Algorithm
for (int col = 0; col < iterations; col++) {
*//I believe this is the point where everything is going wrong*
for (int row = 0; row < n; row++) {
Complex g = Polynomial(Z, n, P[col][row]);
for (int k = 0; k < n; k++) {
if (k != row) {
g = g / (P[col][row] - P[col][k]);
}
}
P[col][row] -= g;
}
return P;
}
}
The following Code is the code I am using to test the function:
int main() {
//Initializing section
vec A = {1, -3, 3,-5 };
int n = 3;
int iterations = 10;
const double precision = 1.0e-10;
Matrix p = roots(A, n, iterations,precision);
for (int i = 0; i < iterations; i++) {
for (int j = 0; j < n; j++) {
cout << "p[" << i << "][" << j << "] = " << p[i][j] << " ";
}
cout << endl;
}
return 0;
}
Important to note the Durand-Kerner-Algorithm is connected to a header file which is not included in this code.
Your problem is that you do not transcribe the new values into the next data record with index col+1. Thus in the next loop you start again with a data set of zero entries. Change to
P[col+1][row] = P[col][row] - g;
If you want to use the new improved approximation immediately for all following approximations, then use
P[col+1][row] = (P[col][row] -= g);
Then the data sets all contain the next approximations, especially the first one will no longer contain the initially set powers.
I wanted to sort nuggets by unit price, so I made a class Nugget with the variables quant, and price, then made a double unit, which is price/quant. The nuggets come in 4, 6 and 9 packs
When I input 10, 10, and 10, for the price of each, I should get a sorted array of 9pack, 6pack and then a 4pack, because 10/9 is less than 10/6 and 10/4. But thats not the case.
#include <iostream>
using namespace std;
class Nugget {
public:
int price;
int quant;
double unit;
Nugget(int price, int quant) {
this->price = price;
this->quant = quant;
this->unit = price/quant;
}
};
int main(){
int n4,n6, n9;
cin >> n4 >> n6 >> n9;
Nugget* nuggetArr[3] = {new Nugget(n4,4), new Nugget(n6,6), new Nugget(n9,9)};
for (int i = 0; i < 3; ++i) {
for (int j = 0; j < 3; ++j) {
if (nuggetArr[j]->unit > nuggetArr[i]->unit) {
Nugget* temp = nuggetArr[i];
nuggetArr[i] = nuggetArr[j];
nuggetArr[j] = temp;
}
}
for (int j = 0; j < 3; j++)
cout << nuggetArr[j]->quant << ' ';
cout << endl << endl;
}
for (int i = 0; i<3; ++i)
cout << nuggetArr[i]->quant << ' ';
return 0;
};
What sorting algorithm you want to use?
If you swap nuggetArr[i] for nuggetArr[j] you should ensure that i < j for ascending or i > j for descending order.
For example:
for (int i = 0; i < 3; ++i) {
for (int j = i + 1; j < 3; ++j) {
if (nuggetArr[j]->unit > nuggetArr[i]->unit) {
Nugget* temp = nuggetArr[i];
nuggetArr[i] = nuggetArr[j];
nuggetArr[j] = temp;
}
}
Your code moves around objects (nuggets) not keeping some order.
For example you swap nuggetArr[1] for nuggetArr[2] and then nuggetArr[2] for nuggetArr[1].
FYI this is similar to Selection sort.
C++ uses a wierd way of casting here. It uses integers until assignement, so the unit value will be an integer even though you used double. If you make the following changes you will find that it will work (Please note that this is just to illustrate the point and is not good to use this because of precision):
...
double price;
double quant;
...
this->unit = this->price/this->quant;
...
Hope this helps
To be more precise as a note if a,b are integers then a/b will be an integer by default and after the division will it only be cast as a double.
I am writing a program which will preform texture synthesis. I have been away from C++ for a while and am having trouble figuring out what I am doing wrong in my class. When I run the program, I get an unhandled exception in the copyToSample function when it tries to access the arrays. It is being called from the bestSampleSearch function when the unhandled exception occurs. The function has been called before and works just fine, but later on in the program it is called a second time and fails. Any ideas? Let me know if anyone needs to see more code. Thanks!
Edit1: Added the bestSampleSearch function and the compareMetaPic function
Edit2: Added a copy constructor
Edit3: Added main()
Edit4: I have gotten the program to work. However there is now a memory leak of some kind or I am running out of memory when I run the program. It seems in the double for loop in main which starts "// while output picture is unfilled" is the problem. If I comment this portion out the program finishes in a timely manner but only one small square is output. Something must be wrong with my bestSampleSearch function.
MetaPic.h
#pragma once
#include <pic.h>
#include <stdlib.h>
#include <cmath>
class MetaPic
{
public:
Pic* source;
Pixel1*** meta;
int x;
int y;
int z;
MetaPic();
MetaPic(Pic*);
MetaPic(const MetaPic&);
MetaPic& operator=(const MetaPic&);
~MetaPic();
void allocateMetaPic();
void copyPixelData();
void copyToOutput(Pic*&);
void copyToMetaOutput(MetaPic&, int, int);
void copyToSample(MetaPic&, int, int);
void freeMetaPic();
};
MetaPic.cpp
#include "MetaPic.h"
MetaPic::MetaPic()
{
source = NULL;
meta = NULL;
x = 0;
y = 0;
z = 0;
}
MetaPic::MetaPic(Pic* pic)
{
source = pic;
x = pic->nx;
y = pic->ny;
z = pic->bpp;
allocateMetaPic();
copyPixelData();
}
MetaPic::MetaPic(const MetaPic& mp)
{
source = mp.source;
x = mp.x;
y = mp.y;
z = mp.z;
allocateMetaPic();
copyPixelData();
}
MetaPic::~MetaPic()
{
freeMetaPic();
}
// create a 3 dimensional array from the original one dimensional array
void MetaPic::allocateMetaPic()
{
meta = (Pixel1***)calloc(x, sizeof(Pixel1**));
for(int i = 0; i < x; i++)
{
meta[i] = (Pixel1**)calloc(y, sizeof(Pixel1*));
for(int j = 0; j < y; j++)
{
meta[i][j] = (Pixel1*)calloc(z, sizeof(Pixel1));
}
}
}
void MetaPic::copyPixelData()
{
for(int j = 0; j < y; j++)
{
for(int i = 0; i < x; i++)
{
for(int k = 0; k < z; k++)
meta[i][j][k] = source->pix[(j*z*x)+(i*z)+k];
}
}
}
void MetaPic::copyToOutput(Pic* &output)
{
for(int j = 0; j < y; j++)
{
for(int i = 0; i < x; i++)
{
for(int k = 0; k < z; k++)
output->pix[(j*z*x)+(i*z)+k] = meta[i][j][k];
}
}
}
// copy the meta data to the final pic output starting at the top left of the picture and mapped to 'a' and 'b' coordinates in the output
void MetaPic::copyToMetaOutput(MetaPic &output, int a, int b)
{
for(int j = 0; (j < y) && ((j+b) < output.y); j++)
{
for(int i = 0; (i < x) && ((i+a) < output.x); i++)
{
for(int k = 0; k < z; k++)
output.meta[i+a][j+b][k] = meta[i][j][k];
}
}
}
// copies from a source image to a smaller sample image
// *** Must make sure that the x and y coordinates have enough buffer space ***
void MetaPic::copyToSample(MetaPic &sample, int a, int b)
{
for(int j = 0; (j < sample.y) && ((b+j) < y); j++)
{
for(int i = 0; i < (sample.x) && ((a+i) < x); i++)
{
for(int k = 0; k < sample.z; k++)
{
**sample.meta[i][j][k] = meta[i+a][j+b][k];**
}
}
}
}
// free the meta pic data (MetaPic.meta)
// *** Not to be used outside of class declaration ***
void MetaPic::freeMetaPic()
{
for(int j = 0; j < y; j++)
{
for(int i = 0; i < z; i++)
free(meta[i][j]);
}
for(int i = 0; i < x; i++)
free(meta[i]);
free(meta);
}
MetaPic MetaPic::operator=(MetaPic mp)
{
MetaPic newMP(mp.source);
return newMP;
}
main.cpp
#ifdef WIN32
// For VC++ you need to include this file as glut.h and gl.h refer to it
#include <windows.h>
// disable the warning for the use of strdup and friends
#pragma warning(disable:4996)
#endif
#include <stdio.h> // Standard Header For Most Programs
#include <stdlib.h> // Additional standard Functions (exit() for example)
#include <iostream>
// Interface to libpicio, provides functions to load/save jpeg files
#include <pic.h>
#include <string.h>
#include <time.h>
#include <cmath>
#include "MetaPic.h"
using namespace std;
MetaPic bestSampleSearch(MetaPic, MetaPic);
double compareMetaPics(MetaPic, MetaPic);
#define SAMPLE_SIZE 23
#define OVERLAP 9
// Texture source image (pic.h uses the Pic* data structure)
Pic *sourceImage;
Pic *outputImage;
int main(int argc, char* argv[])
{
char* pictureName = "reg1.jpg";
int outputWidth = 0;
int outputHeight = 0;
// attempt to read in the file name
sourceImage = pic_read(pictureName, NULL);
if(sourceImage == NULL)
{
cout << "Couldn't read the file" << endl;
system("pause");
exit(EXIT_FAILURE);
}
// *** For now set the output image to 3 times the original height and width ***
outputWidth = sourceImage->nx*3;
outputHeight = sourceImage->ny*3;
// allocate the output image
outputImage = pic_alloc(outputWidth, outputHeight, sourceImage->bpp, NULL);
Pic* currentImage = pic_alloc(SAMPLE_SIZE, SAMPLE_SIZE, sourceImage->bpp, NULL);
MetaPic metaSource(sourceImage);
MetaPic metaOutput(outputImage);
MetaPic metaCurrent(currentImage);
// seed the output image
int x = 0;
int y = 0;
int xupperbound = metaSource.x - SAMPLE_SIZE;
int yupperbound = metaSource.y - SAMPLE_SIZE;
int xlowerbound = 0;
int ylowerbound = 0;
// find random coordinates
srand(time(NULL));
while((x >= xupperbound) || (x <= xlowerbound))
x = rand() % metaSource.x;
while((y >= yupperbound) || (y <= ylowerbound))
y = rand() % metaSource.y;
// copy a random sample from the source to the metasample
metaSource.copyToSample(metaCurrent, x, y);
// copy the seed to the metaoutput
metaCurrent.copyToMetaOutput(metaOutput, 0, 0);
int currentOutputX = 0;
int currentOutputY = 0;
// while the output picture is unfilled...
for(int j = 0; j < yupperbound; j+=(SAMPLE_SIZE-OVERLAP))
{
for(int i = 0; i < xupperbound; i+=(SAMPLE_SIZE-OVERLAP))
{
// move the sample to correct overlap
metaSource.copyToSample(metaCurrent, i, j);
// find the best match for the sample
metaCurrent = bestSampleSearch(metaSource, metaCurrent);
// write the best match to the metaoutput
metaCurrent.copyToMetaOutput(metaOutput, i, j);
// update the values
}
}
// copy the metaOutput to the output
metaOutput.copyToOutput(outputImage);
// output the image
pic_write("reg1_output.jpg", outputImage, PIC_JPEG_FILE);
// clean up
pic_free(sourceImage);
pic_free(outputImage);
pic_free(currentImage);
// return success
cout << "Done!" << endl;
system("pause");
// return success
return 0;
}
// finds the best sample to insert into the image
// *** best must be the sample which consists of the overlap ***
MetaPic bestSampleSearch(MetaPic source, MetaPic best)
{
MetaPic metaSample(best);
double bestScore = 999999.0;
double currentScore = 0.0;
for(int j = 0; j < source.y; j++)
{
for(int i = 0; i < source.x; i++)
{
// copy the image starting at the top left of the source image
source.copyToSample(metaSample, i, j);
// compare the sample with the overlap
currentScore = compareMetaPics(best, metaSample);
// if best score is greater than current score then copy the better sample to best and continue searching
if( bestScore > currentScore)
{
metaSample.copyToSample(best, 0, 0);
bestScore = currentScore;
}
// otherwise, the score is less than current score then do nothing (a better sample has not been found)
}
}
return best;
}
// find the comparison score for the two MetaPics based on their rgb values
// *** Both of the meta pics should be the same size ***
double compareMetaPics(MetaPic pic1, MetaPic pic2)
{
float r1 = 0.0;
float g1 = 0.0;
float b1 = 0.0;
float r2 = 0.0;
float g2 = 0.0;
float b2 = 0.0;
float r = 0.0;
float g = 0.0;
float b = 0.0;
float sum = 0.0;
// take the sum of the (sqrt((r1-r2)^2 + ((g1-g2)^2 + ((b1-b2)^2))
for(int j = 0; (j < pic1.y) && (j < pic2.y); j++)
{
for(int i = 0; (i < pic1.x) && (i < pic2.x); i++)
{
r1 = PIC_PIXEL(pic1.source, i, j, 0);
r2 = PIC_PIXEL(pic2.source, i, j, 0);
g1 = PIC_PIXEL(pic1.source, i, j, 1);
g2 = PIC_PIXEL(pic2.source, i, j, 1);
b1 = PIC_PIXEL(pic1.source, i, j, 2);
b2 = PIC_PIXEL(pic2.source, i, j, 2);
r = r1 - r2;
g = g1 - g2;
b = b1 - b2;
sum += sqrt((r*r) + (g*g) + (b*b));
}
}
return sum;
}
I'm not sure if this is the root cause of the problem, but your assignment operator does not actually assign anything:
MetaPic MetaPic::operator=(MetaPic mp)
{
MetaPic newMP(mp.source);
return newMP;
}
This should probably look something like the following (based off of the code in your copy constructor):
edit: with credit to Alf P. Steinbach
MetaPic& MetaPic::operator=(MetaPic mp)
{
mp.swap(*this);
return *this;
}
It turns out that the deallocate function is incorrect. It should be freeing in the same manner that it was allocating.
void MetaPic::freeMetaPic()
{
for(int j = 0; j < y; j++)
{
for(int i = 0; i < z; i++)
free(meta[i][j]);
}
for(int i = 0; i < x; i++)
free(meta[i]);
free(meta);
}