I wrote a C++ (Rcpp) function to read and fill a multidimensional matrix from a file containing only numbers. When I run it on Linux it works fine and it is pretty fast. However, the same code is much slower (by a factor of 200) on a Windows machine with the same spec. Anyone can spot the problem?
void read_ed0moins_lut_(const char *filename, float downward_irradiance_table_as_output[NBWL][NTHETAS][NO3][NTAUCLD][NALB]) {
std::ifstream infile;
infile.open(filename);
float tmp;
for (int theta = 0; theta < NTHETAS; theta++) {
for (int ozone = 0; ozone < NO3; ozone++) {
for (int taucl = 0; taucl < NTAUCLD; taucl++) {
for (int albedo = 0; albedo < NALB; albedo++) {
for (int wavelength = 0; wavelength < NBWL; wavelength++) {
infile >> tmp; // This line is very slow on Windows
downward_irradiance_table_as_output[wavelength][theta][ozone][taucl][albedo] = tmp;
}
}
}
}
}
// Close file
infile.close();
}
Here are some ideas:
Build in Release mode (with optimization enabled, -O2 flag)
Enable ifstream buffering:
std::ifstream infile(filename);
char buffer[65536];
infile.rdbuf()->pubsetbuf(buffer, sizeof(buffer));
Arrange your array's dimensions in the order of the loops:
downward_irradiance_table_as_output[NTHETAS][NO3][NTAUCLD][NALB][NBWL]
so that you get row-major-order traversal, which is more cache-friendly.
Related
My program opens a file which contains 100,000 numbers and parses them out into a 10,000 x 10 array correlating to 10,000 sets of 10 physical parameters. The program then iterates through each row of the array, performing overlap calculations between that row and every other row in the array.
The process is quite simple, and being new to c++, I programmed it the most straightforward way that I could think of. However, I know that I'm not doing this in the most optimal way possible, which is something that I would love to do, as the program is going to face off against my cohort's identical program, coded in Fortran, in a "race".
I have a feeling that I am going to need to implement multithreading to accomplish my goal of speeding up the program, but not only am I new to c++, I am new to multithreading, so I'm not sure how I should go about creating new threads in a beneficial way, or if it is even something that would give me that much "gain on investment" so to speak.
The program has the potential to be run on a machine with over 50 cores, but because the program is so simple, I'm not convinced that more threads is necessarily better. I think that if I implement two threads to compute the complex parameters of the two gaussians, one thread to compute the overlap between the gaussians, and one thread that is dedicated to writing to the file, I could speed up the program significantly, but I could also be wrong.
CODE:
cout << "Working...\n";
double **gaussian_array;
gaussian_array = (double **)malloc(N*sizeof(double *));
for(int i = 0; i < N; i++){
gaussian_array[i] = (double *)malloc(10*sizeof(double));
}
fstream gaussians;
gaussians.open("GaussParams", ios::in);
if (!gaussians){
cout << "File not found.";
}
else {
//generate the array of gaussians -> [10000][10]
int i = 0;
while(i < N) {
char ch;
string strNums;
string Num;
string strtab[10];
int j = 0;
getline(gaussians, strNums);
stringstream gaussian(strNums);
while(gaussian >> ch) {
if(ch != ',') {
Num += ch;
strtab[j] = Num;
}
else {
Num = "";
j += 1;
}
}
for(int c = 0; c < 10; c++) {
stringstream dbl(strtab[c]);
dbl >> gaussian_array[i][c];
}
i += 1;
}
}
gaussians.close();
//Below is the process to generate the overlap file between all gaussians:
string buffer;
ofstream overlaps;
overlaps.open("OverlapMatrix", ios::trunc);
overlaps.precision(15);
for(int i = 0; i < N; i++) {
for(int j = 0 ; j < N; j++){
double r1[6][2];
double r2[6][2];
double ol[2];
//compute complex parameters from the two gaussians
compute_params(gaussian_array[i], r1);
compute_params(gaussian_array[j], r2);
//compute overlap between the gaussians using the complex parameters
compute_overlap(r1, r2, ol);
//write to file
overlaps << ol[0] << "," << ol[1];
if(j < N - 1)
overlaps << " ";
else
overlaps << "\n";
}
}
overlaps.close();
return 0;
Any suggestions are greatly appreciated. Thanks!
I have a text file that has values and I want to put them into a 2D vector.
I can do it with arrays but I don't know how to do it with vectors.
The vector size should be like vector2D[nColumns][nLines] that I don't know in advance. At the most I can have in the text file the number of columns, but not the number of lines.
The number of columns could be different, from one .txt file to another.
.txt example:
189.53 -1.6700 58.550 33.780 58.867
190.13 -3.4700 56.970 42.190 75.546
190.73 -1.3000 62.360 34.640 56.456
191.33 -1.7600 54.770 35.250 65.470
191.93 -8.7500 58.410 33.900 63.505
with arrays I do it like this:
//------ Declares Array for values ------//
const int nCol = countCols; // read from file
float values[nCol][nLin];
// Fill Array with '-1'
for (int c = 0; c < nCol; c++) {
for (int l = 0; l < nLin; l++) {
values[c][l] = -1;
}
}
// reads file to end of *file*, not line
while (!inFile.eof()) {
for (int y = 0; y < nLin; y++) {
for (int i = 0; i < nCol; i++) {
inFile >> values[i][y];
}
i = 0;
}
}
Instead of using
float values[nCol][nLin];
use
std::vector<std::vector<float>> v;
You have to #include<vector> for this.
Now you don't need to worry about size.
Adding elements is as simple as
std::vector<float> f; f.push_back(7.5); v.push_back(f);
Also do not use .eof() on streams, because it doesn't set it until after the end has been reached and so it will attempt to read the end of the file.
while(!inFile.eof())
Should be
while (inFile >> values[i][y]) // returns true as long as it reads in data to values[x][y]
NOTE: Instead of vector, you can also use std::array, which is apparently the best thing after sliced bread.
My suggestion:
const int nCol = countCols; // read from file
std::vector<std::vector<float>> values; // your entire data-set of values
std::vector<float> line(nCol, -1.0); // create one line of nCol size and fill with -1
// reads file to end of *file*, not line
bool done = false;
while (!done)
{
for (int i = 0; !done && i < nCol; i++)
{
done = !(inFile >> line[i]);
}
values.push_back(line);
}
Now your dataset has:
values.size() // number of lines
and can be adressed with array notation also (besides using iterators):
float v = values[i][j];
Note: this code does not take into account the fact that the last line may have less that nCol data values, and so the end of the line vector will contain wrong values at end of file. You may want to add code to clear the end of the line vector when done becomes false, before you push it into values.
I'm using C++ and have a 1234 by 1234 text file with values 0 to 255. I have been trying to speed up my code because its used in real time with the user. Right now it takes .5 seconds to run with .4 seconds devoted to reading the text file to a vector<vector<int>>. I am using getline then istringstream. Below is the code I'm currently using. There is some stuff in there where I get rid of the first and last 50 columns as well as take the first chunk of rows into one vector and the second chunk into another vector because that's how I need it for processing purposes.
void readInRawData(string fileName, int start, int split, int finish, vector< vector <int> > &rawArrayTop, vector< vector <int> > &rawArrayBottom)
{
string line;
vector<int> rawRow;
int counter=0;
int value=0;
int numberOfColumns=0, numberOfRows=0;
ifstream rawImage;
rawImage.open(fileName.c_str()); //open file using fileName
if (rawImage.is_open()&&!is_empty(rawImage))
{
int length=0;
getline(rawImage,line);
istringstream ss(line);
while(ss>>value)//clump into values between spaces
{
length++;
}
while(getline(rawImage, line))//get row
{
if(counter<start)
{
}
else
{
break;
}
counter++;
}
while(getline(rawImage, line))//get row
{
if(counter<split)
{
rawRow.clear();
istringstream ss(line);
for(int i=0;i<50;i++)
{
ss>>value;
}
for(int i=0; i<length-100; i++)
{
ss>>value;
rawRow.push_back(value);
}
rawArrayTop.push_back(rawRow);
}
else
{
break;
}
counter++;
}
while(getline(rawImage, line))//get row
{
if(counter<finish)
{
rawRow.clear();
istringstream ss(line);
for(int i=0;i<50;i++)
{
ss>>value;
}
for(int i=0; i<length-100; i++)
{
ss>>value;
rawRow.push_back(value);
}
rawArrayBottom.push_back(rawRow);
}
else
{
break;
}
counter++;
}
rawImage.close();
}
//if it can't be opened throw error
else
{
throw rawArrayTop;
}
}
To get a real increase in performance, you'll have to rewrite totally.
while((ch = fgetc(fp)) != EOF)
{
if(isdigit(ch))
{
sample = sample * 10 + ch - '0';
onsample = 1;
}
else
{
if(onsample)
{
*out++ = sample;
sample = 0;
onsample = 0;
}
}
}
Set up out with malloc(width * height). Now it should zip through the file almost as fast as it can read it.
I will not give you the code, but I will suggest how to proceed here:
parsing text takes a long time. If real-time is important, pre-process the file to a binary format, since it can be loaded directly with read/write functions. You will need to create a stream in binary mode from the binary file and use istream::read.
try to avoid vector<vector<int>> unless you use scoped allocators, which I assume you are not using. This is bad for the cache. It is a much better fit to use a vector with n * m reserved space.
If you need bidimensional accesss, you can just code your functions for that:
using Matrix = vector<int>;
int & idx(Matrix, size_t row, size_t col);
Matrix mat(m * n);
idx(mat, 2, 3) = 17;
Another concern is that you must load into the Matrix. If you want to avoid redundant initialization and at the same time prereserve memory before loading the data, that is not possible with the stl vector, but you can use Boost.Container, which has an overload for reserve with default_init_t. That will not trigger initialization of elements in the vector.
if the values are between 0 and 255 use char, not int. You will fit more data at once in cache.
I recently finished writing what I consider my "main.cpp" code in a Win32 Console project. It builds the solution perfectly and the external release version runs and completes within like 30 seconds, which is fast for the number of calculations it does.
When I use my MFC built UI made with just 1 standard dialog box for some simple float inputs, the program that ran fine by itself gets hung up when it has to create and calculate some 2D-vectors.
std::mt19937 generator3(time(0));
static uniform_01<std::mt19937> dist3(generator3);
std::vector<int> e_scatter;
for (int i = 0; i <= n; i++)
{
if (dist3() >= perc_e)
{
e_scatter.push_back(1);
// std::cout << e_scatter[i] << '\n';
// system("pause");
}
else
{
e_scatter.push_back(0);
// std::cout << e_scatter[i] << '\n';
// system("pause");
}
}
string fileName_escatter = "escatter.dat";
FILE* dout4 = fopen(fileName_escatter.c_str(), "w");
for (int i = 0; i <= n; i++)
{
fprintf(dout4, "%d", e_scatter[i]);
fprintf(dout4, "\n");
// fprintf(dout2, "%f", e_scatter[i]);
// fprintf(dout2, "\n");
};
fclose(dout4);
std::vector<vector<float>> electron;
// std::vector<float> angle;
**randutils::mt19937_rng rng2;
std::vector<float> rand_scatter;
for (int i = 0; i <= n; i++)
{
std::vector<float> w;
electron.push_back(w);
rand_scatter.push_back(rng2.uniform(0.0, 1.0));
for (int j = 0; j <= 2000; j++)
{
if (e_scatter[i] == 0)
{
electron[i].push_back(linspace[j] * (cos((rand_scatter[i] * 90) * (PI / 180))));
//electron[i][j] == abs(electron[i][j]);
}
else
{
electron[i].push_back(linspace[j]);
};
};
};**
More specifically it does not get past a specific for loop and I am forced to close it. I've let it run for 20 minutes to see if it was just computing things slower, but still got 0 output from it. I am not that great at the debugging part of code when I have this GUI from MFC since I dont have the console popping up.
Is there something that I am missing when I try to use MFC for the gui and large 2D vectors?
The first loop calculates and spits out an output file 'escatter.dat' after its finished but the second set of loops never finishes and the memory usage keeps ramping up.
linspace[i] is calculated before all of this code and is just a vector of 2001 numbers that it uses to populate the std::vector> electron vector in the double for loops.
Ive included this http://pastebin.com/i8A7t38K link to the MFC part of the code that I was using to not make this post really long to read.
Thank you.
I agree that the debugging checks are the major problem.
But if your program is running 30 seconds, n must be big.
You may consider optimizing your code for reducing memory allocations, by preallocating memory using vector::reserve;
std::vector<vector<float>> electron;
// std::vector<float> angle;
**randutils::mt19937_rng rng2;
std::vector<float> rand_scatter;
electron.reserve(n+1); // worth for big n
rand_scatter.reserve(n+1); // worth for big n
for (int i = 0; i <= n; i++)
{
std::vector<float> w;
electron.push_back(w);
rand_scatter.push_back(rng2.uniform(0.0, 1.0));
electron[i].reserve(2000+1); // really worth for big n
for (int j = 0; j <= 2000; j++)
{
if (e_scatter[i] == 0)
{
electron[i].push_back(linspace[j] * (cos((rand_scatter[i] * 90) * (PI / 180))));
//electron[i][j] == abs(electron[i][j]);
}
else
{
electron[i].push_back(linspace[j]);
};
};
};**
or rewrite by not using push_back (since you know all sizes)
std::vector<vector<float>> electron(n+1);
// std::vector<float> angle;
**randutils::mt19937_rng rng2;
std::vector<float> rand_scatter(n+1);
for (int i = 0; i <= n; i++)
{
std::vector<float>& w=electron[i];
w.reserve(2000+1);
float r=rng2.uniform(0.0, 1.0);
rand_scatter[i]=r;
for (int j = 0; j <= 2000; j++)
{
float f;
if (e_scatter[i] == 0)
{
f=linspace[j] * (cos((r * 90) * (PI / 180)));
// f=abs(f);
}
else
{
f=linspace[j];
};
w[j]=f;
};
};**
After that runtime might decrease to at most few seconds.
Another optimization
string fileName_escatter = "escatter.dat";
FILE* dout4 = fopen(fileName_escatter.c_str(), "w");
for (int i = 0; i <= n; i++)
{
fprintf(dout4, "%d\n", e_scatter[i]); // save one method call
// fprintf(dout2, "%f\n", e_scatter[i]);
};
fclose(dout4);
BTW: ofstream is the stl-way of writing files, like
ofstream dout4("escatter.dat", std::ofstream::out);
for (int i = 0; i <= n; i++)
{
dout4 << e_scatter[i] << std::endl;
};
dout4.close();
I have some code written in C++ and when I compile it on my laptop, the results show, however, I have tried to compile and run the code onto the RPI and I get the error:
Segmentation fault
How the program (currently) works:
Reads in a (.wav) file into a vector of doubles ("rawData")
Splits the rawData into blocks (blockked)
The segmentation fault happens when I try and split the data into blocks. The sizes:
rawData - 57884
blockked - 112800
Now I know the RPI only has 256MB and this could possibly be the problem, or, i'm not handling the data properly. I have included some code as well, to help demonstrate how things are running:
(main.cpp):
int main()
{
int N = 600;
int M = 200;
float sumthresh = 0.035;
float zerocorssthres = 0.060;
Wav sampleWave;
if(!sampleWave.readAudio("repositry/example.wav", DOUBLE))
{
cout << "Cannot open the file BOOM";
}
// Return the data
vector<double> rawData = sampleWave.returnRaw();
// THIS segments (typedef vector<double> iniMatrix;)
vector<iniMatrix> blockked = sampleWave.something(rawData, N, M);
cout << rawData.size();
return EXIT_SUCCESS;
}
(function: something)
int n = theData.size();
int maxblockstart = n - N;
int lastblockstart = maxblockstart - (maxblockstart % M);
int numblocks = (lastblockstart)/M + 1;
vector< vector<double> > subBlock;
vector<double> temp;
this->width = N;
this->height = numblocks;
subBlock.resize(600*187);
for(int i=0; (i < 600); i++)
{
subBlock.push_back(vector<double>());
for(int j=0; (j < 187); j++)
{
subBlock[i].push_back(theData[i*N+j]);
}
}
return subBlock;
Any suggestions would be greatly appreciated :)! Hopefully this is enough description.
You're probably overrunning an array somewhere (Maybe not even in the code you posted). I'm not really sure what you're trying to do with the blocking either, but I guess you want to split your wave file into 600 sample chunks?
If so, I think you want something more like the following:
std::vector<std::vector<double>>
SimpleWav::something(const std::vector<double>& data, int N) {
//How many blocks of size N can we get?
int num_blocks = data.size() / N;
//Create the vector with enough empty slots for num_blocks blocks
std::vector<std::vector<double>> blocked(num_blocks);
//Loop over all the blocks
for(int i = 0; i < num_blocks; i++) {
//Resize the inner vector to fit this block
blocked[i].resize(N);
//Pull each sample for this block
for(int j = 0; j < N; j++) {
blocked[i][j] = data[i*N + j];
}
}
return blocked;
}