c++ MFC slow runtime with 2D vector - c++

I recently finished writing what I consider my "main.cpp" code in a Win32 Console project. It builds the solution perfectly and the external release version runs and completes within like 30 seconds, which is fast for the number of calculations it does.
When I use my MFC built UI made with just 1 standard dialog box for some simple float inputs, the program that ran fine by itself gets hung up when it has to create and calculate some 2D-vectors.
std::mt19937 generator3(time(0));
static uniform_01<std::mt19937> dist3(generator3);
std::vector<int> e_scatter;
for (int i = 0; i <= n; i++)
{
if (dist3() >= perc_e)
{
e_scatter.push_back(1);
// std::cout << e_scatter[i] << '\n';
// system("pause");
}
else
{
e_scatter.push_back(0);
// std::cout << e_scatter[i] << '\n';
// system("pause");
}
}
string fileName_escatter = "escatter.dat";
FILE* dout4 = fopen(fileName_escatter.c_str(), "w");
for (int i = 0; i <= n; i++)
{
fprintf(dout4, "%d", e_scatter[i]);
fprintf(dout4, "\n");
// fprintf(dout2, "%f", e_scatter[i]);
// fprintf(dout2, "\n");
};
fclose(dout4);
std::vector<vector<float>> electron;
// std::vector<float> angle;
**randutils::mt19937_rng rng2;
std::vector<float> rand_scatter;
for (int i = 0; i <= n; i++)
{
std::vector<float> w;
electron.push_back(w);
rand_scatter.push_back(rng2.uniform(0.0, 1.0));
for (int j = 0; j <= 2000; j++)
{
if (e_scatter[i] == 0)
{
electron[i].push_back(linspace[j] * (cos((rand_scatter[i] * 90) * (PI / 180))));
//electron[i][j] == abs(electron[i][j]);
}
else
{
electron[i].push_back(linspace[j]);
};
};
};**
More specifically it does not get past a specific for loop and I am forced to close it. I've let it run for 20 minutes to see if it was just computing things slower, but still got 0 output from it. I am not that great at the debugging part of code when I have this GUI from MFC since I dont have the console popping up.
Is there something that I am missing when I try to use MFC for the gui and large 2D vectors?
The first loop calculates and spits out an output file 'escatter.dat' after its finished but the second set of loops never finishes and the memory usage keeps ramping up.
linspace[i] is calculated before all of this code and is just a vector of 2001 numbers that it uses to populate the std::vector> electron vector in the double for loops.
Ive included this http://pastebin.com/i8A7t38K link to the MFC part of the code that I was using to not make this post really long to read.
Thank you.

I agree that the debugging checks are the major problem.
But if your program is running 30 seconds, n must be big.
You may consider optimizing your code for reducing memory allocations, by preallocating memory using vector::reserve;
std::vector<vector<float>> electron;
// std::vector<float> angle;
**randutils::mt19937_rng rng2;
std::vector<float> rand_scatter;
electron.reserve(n+1); // worth for big n
rand_scatter.reserve(n+1); // worth for big n
for (int i = 0; i <= n; i++)
{
std::vector<float> w;
electron.push_back(w);
rand_scatter.push_back(rng2.uniform(0.0, 1.0));
electron[i].reserve(2000+1); // really worth for big n
for (int j = 0; j <= 2000; j++)
{
if (e_scatter[i] == 0)
{
electron[i].push_back(linspace[j] * (cos((rand_scatter[i] * 90) * (PI / 180))));
//electron[i][j] == abs(electron[i][j]);
}
else
{
electron[i].push_back(linspace[j]);
};
};
};**
or rewrite by not using push_back (since you know all sizes)
std::vector<vector<float>> electron(n+1);
// std::vector<float> angle;
**randutils::mt19937_rng rng2;
std::vector<float> rand_scatter(n+1);
for (int i = 0; i <= n; i++)
{
std::vector<float>& w=electron[i];
w.reserve(2000+1);
float r=rng2.uniform(0.0, 1.0);
rand_scatter[i]=r;
for (int j = 0; j <= 2000; j++)
{
float f;
if (e_scatter[i] == 0)
{
f=linspace[j] * (cos((r * 90) * (PI / 180)));
// f=abs(f);
}
else
{
f=linspace[j];
};
w[j]=f;
};
};**
After that runtime might decrease to at most few seconds.
Another optimization
string fileName_escatter = "escatter.dat";
FILE* dout4 = fopen(fileName_escatter.c_str(), "w");
for (int i = 0; i <= n; i++)
{
fprintf(dout4, "%d\n", e_scatter[i]); // save one method call
// fprintf(dout2, "%f\n", e_scatter[i]);
};
fclose(dout4);
BTW: ofstream is the stl-way of writing files, like
ofstream dout4("escatter.dat", std::ofstream::out);
for (int i = 0; i <= n; i++)
{
dout4 << e_scatter[i] << std::endl;
};
dout4.close();

Related

Performance bottleneck in writing a large matrix of doubles to a file

My program opens a file which contains 100,000 numbers and parses them out into a 10,000 x 10 array correlating to 10,000 sets of 10 physical parameters. The program then iterates through each row of the array, performing overlap calculations between that row and every other row in the array.
The process is quite simple, and being new to c++, I programmed it the most straightforward way that I could think of. However, I know that I'm not doing this in the most optimal way possible, which is something that I would love to do, as the program is going to face off against my cohort's identical program, coded in Fortran, in a "race".
I have a feeling that I am going to need to implement multithreading to accomplish my goal of speeding up the program, but not only am I new to c++, I am new to multithreading, so I'm not sure how I should go about creating new threads in a beneficial way, or if it is even something that would give me that much "gain on investment" so to speak.
The program has the potential to be run on a machine with over 50 cores, but because the program is so simple, I'm not convinced that more threads is necessarily better. I think that if I implement two threads to compute the complex parameters of the two gaussians, one thread to compute the overlap between the gaussians, and one thread that is dedicated to writing to the file, I could speed up the program significantly, but I could also be wrong.
CODE:
cout << "Working...\n";
double **gaussian_array;
gaussian_array = (double **)malloc(N*sizeof(double *));
for(int i = 0; i < N; i++){
gaussian_array[i] = (double *)malloc(10*sizeof(double));
}
fstream gaussians;
gaussians.open("GaussParams", ios::in);
if (!gaussians){
cout << "File not found.";
}
else {
//generate the array of gaussians -> [10000][10]
int i = 0;
while(i < N) {
char ch;
string strNums;
string Num;
string strtab[10];
int j = 0;
getline(gaussians, strNums);
stringstream gaussian(strNums);
while(gaussian >> ch) {
if(ch != ',') {
Num += ch;
strtab[j] = Num;
}
else {
Num = "";
j += 1;
}
}
for(int c = 0; c < 10; c++) {
stringstream dbl(strtab[c]);
dbl >> gaussian_array[i][c];
}
i += 1;
}
}
gaussians.close();
//Below is the process to generate the overlap file between all gaussians:
string buffer;
ofstream overlaps;
overlaps.open("OverlapMatrix", ios::trunc);
overlaps.precision(15);
for(int i = 0; i < N; i++) {
for(int j = 0 ; j < N; j++){
double r1[6][2];
double r2[6][2];
double ol[2];
//compute complex parameters from the two gaussians
compute_params(gaussian_array[i], r1);
compute_params(gaussian_array[j], r2);
//compute overlap between the gaussians using the complex parameters
compute_overlap(r1, r2, ol);
//write to file
overlaps << ol[0] << "," << ol[1];
if(j < N - 1)
overlaps << " ";
else
overlaps << "\n";
}
}
overlaps.close();
return 0;
Any suggestions are greatly appreciated. Thanks!

How to access a vector inside a vector?

So I have a vector of vectors type double. I basically need to be able to set 360 numbers to cosY, and then put those 360 numbers into cosineY[0], then get another 360 numbers that are calculated with a different a now, and put them into cosineY[1].Technically my vector is going to be cosineYa I then need to be able to take out just cosY for a that I specify...
My code is saying this:
for (int a = 0; a < 8; a++)
{
for int n=0; n <= 360; n++
{
cosY[n] = cos(a*vectorOfY[n]);
}
cosineY.push_back(cosY);
}
which I hope is the correct way of actually setting it.
But then I need to take cosY for a that I specify, and calculate another another 360 vector, which will be stored in another vector again as a vector of vectors.
Right now I've got:
for (int a = 0; a < 8; a++
{
for (int n = 0; n <= 360; n++)
{
cosProductPt[n] = (VectorOfY[n]*cosY[n]);
}
CosProductY.push_back(cosProductPt);
}
The VectorOfY is besically the amplitude of an input wave. What I am doing is trying to create a cosine wave with different frequencies (a). I am then calculation the product of the input and cosine wave at each frequency. I need to be able to access these 360 points for each frequency later on in the program, and right now also I need to calculate the addition of all elements in cosProductPt, for every frequency (stored in cosProductY), and store it in a vector dotProductCos[a].
I've been trying to work it out but I don't know how to access all the elements in a vector of vectors to add them. I've been trying to do this for the whole day without any results. Right now I know so little that I don't even know how I would display or access a vector inside a vector, but I need to use that access point for the addition.
Thank you for your help.
for (int a = 0; a < 8; a++)
{
for int n=0; n < 360; n++) // note traded in <= for <. I think you had an off by one
// error here.
{
cosY[n] = cos(a*vectorOfY[n]);
}
cosineY.push_back(cosY);
}
Is sound so long as cosY has been pre-allocated to contain at least 360 elements. You could
std::vector<std::vector<double>> cosineY;
std::vector<double> cosY(360); // strongly consider replacing the 360 with a well-named
// constant
for (int a = 0; a < 8; a++) // same with that 8
{
for int n=0; n < 360; n++)
{
cosY[n] = cos(a*vectorOfY[n]);
}
cosineY.push_back(cosY);
}
for example, but this hangs on to cosY longer than you need to and could cause problems later, so I'd probably scope cosY by throwing the above code into a function.
std::vector<std::vector<double>> buildStageOne(std::vector<double> &vectorOfY)
{
std::vector<std::vector<double>> cosineY;
std::vector<double> cosY(NumDegrees);
for (int a = 0; a < NumVectors; a++)
{
for int n=0; n < NumDegrees; n++)
{
cosY[n] = cos(a*vectorOfY[n]); // take radians into account if needed.
}
cosineY.push_back(cosY);
}
return cosineY;
}
This looks horrible, returning the vector by value, but the vast majority of compilers will take advantage of Copy Elision or some other sneaky optimization to eliminate the copying.
Then I'd do almost the exact same thing for the second step.
std::vector<std::vector<double>> buildStageTwo(std::vector<double> &vectorOfY,
std::vector<std::vector<double>> &cosineY)
{
std::vector<std::vector<double>> CosProductY;
for (int a = 0; a < numVectors; a++)
{
for (int n = 0; n < NumDegrees; n++)
{
cosProductPt[n] = (VectorOfY[n]*cosineY[a][n]);
}
CosProductY.push_back(cosProductPt);
}
return CosProductY;
}
But we can make a couple optimizations
std::vector<std::vector<double>> buildStageTwo(std::vector<double> &vectorOfY,
std::vector<std::vector<double>> &cosineY)
{
std::vector<std::vector<double>> CosProductY;
for (int a = 0; a < numVectors; a++)
{
// why risk constantly looking up cosineY[a]? grab it once and cache it
std::vector<double> & cosY = cosineY[a]; // note the reference
for (int n = 0; n < numDegrees; n++)
{
cosProductPt[n] = (VectorOfY[n]*cosY[n]);
}
CosProductY.push_back(cosProductPt);
}
return CosProductY;
}
And the next is kind of an extension of the first:
std::vector<std::vector<double>> buildStageTwo(std::vector<double> &vectorOfY,
std::vector<std::vector<double>> &cosineY)
{
std::vector<std::vector<double>> CosProductY;
std::vector<double> cosProductPt(360);
for (std::vector<double> & cosY: cosineY) // range based for. Gets rid of
{
for (int n = 0; n < NumDegrees; n++)
{
cosProductPt[n] = (VectorOfY[n]*cosY[n]);
}
CosProductY.push_back(cosProductPt);
}
return CosProductY;
}
We could do the same range-based for trick for the for (int n = 0; n < NumDegrees; n++), but since we are iterating multiple arrays here it's not all that helpful.

Decimation in C++

Dear Stack Community,
I'm doing a DSP exercise to complement my C++ FIR lowpass filter with filter coefficients designed in and exported from Matlab. The DSP exercise in question is the act of decimating the output array of the FIR lowpass filter to a lower sample rate by a factor of 'M'. In C++ I made a successful but extremely simple implementation within a .cpp file and I've been trying hard to convert it to a function to which I can give the output array of the FIR filter. Here is the very basic version of the code:
int n = 0;
int length = 50;
int M = 12;
float array[length];
float array2[n];
for (int i = 0 ; i<length; i++) {
array[i] = std::rand();
}
for (int i = 0; i<length; i=i+M) {
array2[n++] = array[i];
}
for (int i = 0; i<n; i++) {
std::cout << i << " " << array2[i] << std::endl;
}
As you can see very simple. My attempt to convert this to a function using is unfortunately not working. Here is the function as is:
std::vector<float> decimated_array(int M,std::vector<float> arr){
size_t n_idx = 0;
std::vector<float> decimated(n_idx);
for (int i = 0; i<(int)arr.size(); i = i + M) {
decimated[n_idx++] = arr[i];
}
return decimated;
}
This produces a very common Xcode error of EXC_BAD_ACCESS when using this section of code in the .cpp file. The error occurs in the line 'decimated[n_idx++] = arr[i];' specifically:
int length = 50;
int M = 3;
std::vector<float> fct_array(length);
for (int i = 0 ; i<length; i++) {
fct_array[i] = std::rand();
}
FIR_LPF test;
std::vector<float> output;
output = test.decimated_array(M,fct_array);
I'm trying to understand what is incorrect with my application of or perhaps just my translation of the algorithm into a more general setting. Any help with this matter would be greatly appreciated and hopefully this is clear enough for the community to understand.
Regards, Vhaanzeit
The issue:
size_t n_idx = 0;
std::vector<float> decimated(n_idx);
You did not size the vector before you used it, thus you were invoking undefined behavior when assigning to element 0, 1, etc. of the decimated vector.
What you could have done is in the loop, call push_back:
std::vector<float> decimated_array(int M,std::vector<float> arr)
{
std::vector<float> decimated;
for (size_t i = 0; i < arr.size(); i = i + M) {
decimated.push_back(arr[i]);
}
return decimated;
}
The decimated vector starts out empty, but a new item is added with the push_back call.
Also, you should pass the arr vector by const reference, not by value.
std::vector<float> decimated_array(int M, const std::vector<float>& arr);
Passing by (const) reference does not invoke a copy.
Edit: Changed loop counter to correct type, thus not needing the cast.

Double free or corruption (out) on return

I'm doing an assignment that involves calculating pi with threads. I've done this using mutex and it works fine, but I would like to get this version working as well. Here is my code.
#include <iostream>
#include <stdlib.h>
#include <iomanip>
#include <vector>
#include <pthread.h>
using namespace std;
typedef struct{
int iterations; //How many iterations this thread is going to do
int offset; //The offset multiplier for the calculations (Makes sure each thread calculates a different part of the formula)
}threadParameterList;
vector<double> partialSumList;
void* pi_calc(void* param){
threadParameterList* _param = static_cast<threadParameterList*>(param);
double k = 1.0;
for(int i = _param->iterations * _param->offset + 1; i < _param->iterations * (_param->offset + 1); ++i){
partialSumList[_param->offset] += (double)k*(4.0/((2.0*i)*(2.0*i+1.0)*(2.0*i+2.0)));
k *= -1.0;
}
pthread_exit(0);
}
int main(int argc, char* argv[]){
//Error checking
if(argc != 3){
cout << "error: two parameters required [iterations][threadcount]" << endl;
return -1;
}
if(atoi(argv[1]) <= 0 || atoi(argv[2]) <= 0){
cout << "error: invalid parameter supplied - parameters must be > 0." << endl;
return -1;
}
partialSumList.resize(atoi(argv[2]));
vector<pthread_t> threadList (atoi(argv[2]));
vector<threadParameterList> parameterList (atoi(argv[2]));
int iterations = atoi(argv[1]),
threadCount = atoi(argv[2]);
//Calculate workload for each thread
if(iterations % threadCount == 0){ //Threads divide evenly
for(int i = 0; i < threadCount; ++i){
parameterList[i].iterations = iterations/threadCount;
parameterList[i].offset = i;
pthread_create(&threadList[i], NULL, pi_calc, &parameterList[i]);
}
void* status;
for(int i = 0; i < threadCount; ++i){
pthread_join(threadList[i], &status);
}
}
else{ //Threads do not divide evenly
for(int i = 0; i < threadCount - 1; ++i){
parameterList[i].iterations = iterations/threadCount;
parameterList[i].offset = i;
pthread_create(&threadList[i], NULL, pi_calc, &parameterList[i]);
}
//Add the remainder to the last thread
parameterList[threadCount].iterations = (iterations % threadCount) + (iterations / threadCount);
parameterList[threadCount].offset = threadCount - 1;
pthread_create(&threadList[threadCount], NULL, pi_calc, &parameterList[threadCount]);
void* status;
for(int i = 0; i < threadCount-1; ++i){
pthread_join(threadList[i], &status);
cout << status << endl;
}
}
//calculate pi
double pi = 3.0;
for(int i = 0; i < partialSumList.size(); ++i){
pi += partialSumList[i];
}
cout << "Value of pi: " << setw(15) << setprecision(15) << pi << endl;
return 0;
}
The code works fine in most cases. There are certain combinations of parameters that cause me to get a double free or corruption error on return 0. For example, if I use the parameters 100 and 10 the program creates 10 threads and does 10 iterations of the formula on each thread, works fine. If I use the parameters 10 and 4 the program creates 4 threads that do 2 iterations on 3 threads and 4 on the 4th thread, works fine. However, if I use 5 and 3, the program will correctly calculate the value and even print it out, but I get the error immediately after. This also happens for 17 and 3, and 10 and 3. I tried 15 and 7, but then I get a munmap_chunk(): invalid pointer error when the threads are trying to be joined - although i think that's something for another question.
If I had to guess, it has something to do with pthread_exit deallocating memory and then the same memory trying to be deallocated again on return, since I'm passing the parameter struct as a pointer. I tried a few different things like creating a local copy and defining parameterList as a vector of pointers, but it didn't solve anything. I've also tried eraseing and clearing the vector before return but that didn't help either.
I see this issue:
You are writing beyond the vector's bounds:
vector<threadParameterList> parameterList (atoi(argv[2]));
//...
int threadCount = atoi(argv[2]);
//...
parameterList[threadCount].iterations = (iterations % threadCount) + (iterations / threadCount);
parameterList[threadCount].offset = threadCount - 1;
Accessing parameterList[threadCount] is out of bounds.
I don't see in the code where threadCount is adjusted, so it remains the same value throughout that snippet.
Tip: If the goal is to access the last item in a container, use vector::back(). It works all the time for non-empty vectors.
parameterList.back().iterations = (iterations % threadCount) + (iterations / threadCount);
parameterList.back().offset = threadCount - 1;
One thing I can see is you might be going past the end of the vector here:
for(int i = 0; i < partialSumList.capacity(); ++i)
capacity() returns how many elements the vector can hold. This can be more than the size() of the vector. You can change you call to capacity() to size() to make sure you don't go past the end of the vector
for(int i = 0; i < partialSumList.size(); ++i)
The second thing I spot is that when iterations % threadCount != 0 you have:
parameterList[threadCount].iterations = (iterations % threadCount) + (iterations / threadCount);
parameterList[threadCount].offset = threadCount - 1;
pthread_create(&threadList[threadCount], NULL, pi_calc, &parameterList[threadCount]);
Which is writing past the end of the vector. Then when you join all of the threads you don't join the last thread as you do:
for(int i = 0; i < threadCount-1; ++i){
^^^ uh oh. we missed the last thread
pthread_join(threadList[i], &status);
cout << status << endl;
}

Raspberry Pi C++ Segmentation fault

I have some code written in C++ and when I compile it on my laptop, the results show, however, I have tried to compile and run the code onto the RPI and I get the error:
Segmentation fault
How the program (currently) works:
Reads in a (.wav) file into a vector of doubles ("rawData")
Splits the rawData into blocks (blockked)
The segmentation fault happens when I try and split the data into blocks. The sizes:
rawData - 57884
blockked - 112800
Now I know the RPI only has 256MB and this could possibly be the problem, or, i'm not handling the data properly. I have included some code as well, to help demonstrate how things are running:
(main.cpp):
int main()
{
int N = 600;
int M = 200;
float sumthresh = 0.035;
float zerocorssthres = 0.060;
Wav sampleWave;
if(!sampleWave.readAudio("repositry/example.wav", DOUBLE))
{
cout << "Cannot open the file BOOM";
}
// Return the data
vector<double> rawData = sampleWave.returnRaw();
// THIS segments (typedef vector<double> iniMatrix;)
vector<iniMatrix> blockked = sampleWave.something(rawData, N, M);
cout << rawData.size();
return EXIT_SUCCESS;
}
(function: something)
int n = theData.size();
int maxblockstart = n - N;
int lastblockstart = maxblockstart - (maxblockstart % M);
int numblocks = (lastblockstart)/M + 1;
vector< vector<double> > subBlock;
vector<double> temp;
this->width = N;
this->height = numblocks;
subBlock.resize(600*187);
for(int i=0; (i < 600); i++)
{
subBlock.push_back(vector<double>());
for(int j=0; (j < 187); j++)
{
subBlock[i].push_back(theData[i*N+j]);
}
}
return subBlock;
Any suggestions would be greatly appreciated :)! Hopefully this is enough description.
You're probably overrunning an array somewhere (Maybe not even in the code you posted). I'm not really sure what you're trying to do with the blocking either, but I guess you want to split your wave file into 600 sample chunks?
If so, I think you want something more like the following:
std::vector<std::vector<double>>
SimpleWav::something(const std::vector<double>& data, int N) {
//How many blocks of size N can we get?
int num_blocks = data.size() / N;
//Create the vector with enough empty slots for num_blocks blocks
std::vector<std::vector<double>> blocked(num_blocks);
//Loop over all the blocks
for(int i = 0; i < num_blocks; i++) {
//Resize the inner vector to fit this block
blocked[i].resize(N);
//Pull each sample for this block
for(int j = 0; j < N; j++) {
blocked[i][j] = data[i*N + j];
}
}
return blocked;
}