C++: how to output data to multiple .dat files? - c++

I have a research project I'm working on. I am a beginner in C++ and programming in general. I have already made a program that generates interacting particles that move on continuous space as time progresses. The only things my program outputs are the XY coordinates for each particle in each time-step.
I want to visualize my findings, to know if my particles are moving as they should. My professor said that I must use gnuplot. Since I could not find a way to output my data in one file so that gnuplot would recognize it, I thought of the following strategy:
a) For each time-step generate one file with XY coordinates of the form "output_#.dat".
b) Generate a .png file for each one of them in gnuplot.
c) Make a movie of the moving particles with all the .png files.
I am going to worry about b and c later, but up to now, I am able to output all my data in one file using this code:
void main()
{
int i = 0;
int t = 0; // time
int j = 0;
int ctr = 0;
double sumX = 0;
double sumY = 0;
srand(time(NULL)); // give system time as seed to random generator
Cell* particles[maxSize]; // create array of pointers of type Cell.
for(i=0; i<maxSize; i++)
{
particles[i] = new Cell(); // initialize in memory
particles[i]->InitPos(); // give initial positions
}
FILE *newFile = fopen("output_00.dat","w"); // print initial positions
for(i=0; i<maxSize; i++)
{
fprintf(newFile, "%f %3 ", particles[i]->getCurrX());
fprintf(newFile, "%f %3 \n", particles[i]->getCurrY());
}
fclose(newFile);
FILE *output = fopen("output_01.dat","w");
for(t = 0; t < tMax; t++)
{
fprintf(output, "%d ", t);
for(i=0; i<maxSize; i++) // for every cell
{
sumX = 0;
sumY = 0;
for(j=0; j<maxSize; j++) // for all surrounding cells
{
sumX += particles[i]->ForceX(particles[i], particles[j]);
sumY += particles[i]->ForceY(particles[i], particles[j]);
}
particles[i]->setVelX(particles[i]->getPrevVelX() + (sumX)*dt); // update speed
particles[i]->setVelY(particles[i]->getPrevVelY() + (sumY)*dt);
particles[i]->setCurrX(particles[i]->getPrevX() + (particles[i]->getVelX())*dt); // update position
particles[i]->setCurrY(particles[i]->getPrevY() + (particles[i]->getVelY())*dt);
fprintf(output, " ");
fprintf(output, "%f %3 ", particles[i]->getCurrX());
fprintf(output, "%f %3 \n", particles[i]->getCurrY());
}
}
fclose(output);
}
This indeed generates 2 files, output_00.dat and output01.dat with the first one containing the initial randomly generated positions and the second one containing all my results.
I can feel that in the nested for loop, where I'm updating the speed and position for the XY coordinates, I can have a FILE* that will store the coordinates for each time step and then close it, before incrementing time. In that way, I will not need multiple pointers to be open at the same time. At least that is my intuition.
I do not know how to generate incrementing filenames. I have stumbled upon ofstream, but I don't understand how it works...
I think what I would like my program to do at this point is:
1) Generate a new file name, using a base name and the current loop counter value.
2) Open that file.
3) Write the coordinates for that time-step.
4) Close the file.
5) Repeat.
Any help will be greatly appreciated. Thank you for your time!

Using ofstream instead of fopen would be a better use of the C++ standard library, whereas now you are using C standard library calls, but there is nothing wrong per se with what you are doing doing now.
It seems like your real core question is how to generate a filename from an integer so you can use it in a loop:
Here is one way:
// Include these somewhere
#include <string>
#include <sstream>
// Define this function
std::string make_output_filename(size_t index) {
std::ostringstream ss;
ss << "output_" << index << ".dat";
return ss.str();
}
// Use that function with fopen in this way:
for (size_t output_file_number=0; /* rest of your for loop stuff */) {
FILE *file = fopen(make_output_filename(output_file_number).c_str(), "w");
/* use the file */
fclose(file);
}
This uses a std::ostringstream" to build a filename using stream operations, and returns the built std::string. When you pass it to fopen, you have to give it a const char * rather than a std::string, so we use the .c_str() member which exists just for this purpose.

Related

Reading/writing binary file returns 0xCCCCCCCC

I have a script that dumps class info into a binary file, then another script that retrieves it.
Since binary files only accept chars, I wrote three functions for reading and writing Short Ints, Ints, and Floats. I've been experimenting with them so they're not overloaded properly, but they all look like this:
void writeInt(ofstream& file, int val) {
file.write(reinterpret_cast<char *>(&val), sizeof(val));
}
int readInt(ifstream& file) {
int val;
file.read(reinterpret_cast<char *>(&val), sizeof(val));
return val;
}
I'll put the class load/save script at the end of the post, but I don't think it'll make too much sense without the rest of the class info.
Anyway, it seems that the file gets saved properly. It has the correct size, and all of the data matches when I load it. However, at some point in the load process, the file.read() function starts returning 0xCCCCCCCC every time. This looks to me like a read error, but I'm not sure why, or how to correct it. Since the file is the correct size, and I don't touch the seekg() function, it doesn't seem likely that it's reaching the end of file prematurely. I can only assume it's an issue with my read/write method, since I did kind of hack it together with limited knowledge. However, if this is the case, why does it read all the data up to a certain point without issue?
The error starts occurring at a random point each run. This may or may not be related to the fact that all the class data is randomly generated.
Does anyone have experience with this? I'm not even sure how to continue debugging it at this point.
Anyway, here are the load/save functions:
void saveToFile(string fileName) {
ofstream dataFile(fileName.c_str());
writeInt(dataFile, inputSize);
writeInt(dataFile, fullSize);
writeInt(dataFile, outputSize);
// Skips input nodes - no data needs to be saved for them.
for (int i = inputSize; i < fullSize; i++) { // Saves each node after inputSize
writeShortInt(dataFile, nodes[i].size);
writeShortInt(dataFile, nodes[i].skip);
writeFloat(dataFile, nodes[i].value);
//vector<int> connects;
//vector<float> weights;
for (int j = 0; j < nodes[i].size; j++) {
writeInt(dataFile, nodes[i].connects[j]);
writeFloat(dataFile, nodes[i].weights[j]);
}
}
read(500);
}
void loadFromFile(string fileName) {
ifstream dataFile(fileName.c_str());
inputSize = readInt(dataFile);
fullSize = readInt(dataFile);
outputSize = readInt(dataFile);
nodes.resize(fullSize);
for (int i = 0; i < inputSize; i++) {
nodes[i].setSize(0); // Sets input nodes
}
for (int i = inputSize; i < fullSize; i++) { // Loads each node after inputSize
int s = readShortInt(dataFile);
nodes[i].setSize(s);
nodes[i].skip = readShortInt(dataFile);
nodes[i].value = readFloat(dataFile);
//vector<int> connects;
//vector<float> weights;
for (int j = 0; j < nodes[i].size; j++) {
nodes[i].connects[j] = readInt(dataFile); //Error occurs in a random instance of this call of readInt().
nodes[i].weights[j] = readFloat(dataFile);
}
read(i); //Outputs class data to console
}
read(500);
}
Thanks in advance!
You have to check the result of open, read, write operations.
And you need to open files (for reading and writing) as binary.

Faster File Operations C++

So I am making a renderer in c++ and opengl for a class of mine. I am making an animation program for extra credit that will change values in a text file right before my renderer reads them in each frame. My problem is that this section of code isn't writing fast enough
while (clock() < time_end)
{
timeStep = clock() + fps * CLOCKS_PER_SEC;
for(int k=0; k < currOps.size(); k++)
{
// increase/decrease each set once for the current timestep
// a case for each operation
int pos = currAxis[k];
if(currOps[k] == "loc")
{
opsFile[8+pos] = patch::to_string(atof(opsFile[8+pos].c_str()) + locScale[pos-1]*timeAdjust);
//edit this value by loc adjust
}
else if(currOps[k] == "rot")
{
opsFile[4+pos] = patch::to_string(atof(opsFile[4+pos].c_str()) + rotScale[pos-1]*timeAdjust);
//edit this value by rot adjust
}
else if(currOps[k] == "scl")
{
opsFile[pos] = patch::to_string(atof(opsFile[pos].c_str()) + sclScale[pos-1]*timeAdjust);
//edit this value by scl adjust
}
}
currFile.close(); //save file and restart so we don't append multiple times
currFile.open(files[location[0]].c_str(), ofstream::out); // so we can write to the file after closing
for(int j=0; j <opsFile.size(); j++)
{
// update the file
currFile << opsFile.at(j);
currFile << "\n";
}
while(clock() < timeStep)
{
//wait for the next time steps
}
}
Specifically currFile operations at the end. If I take the currFile operations out it will run at the desired fps. FPS is set to .033 so that it does 30 fps. Also it will run fast enough when fps = 0.1. Any optimizations would be great. If need to see any other part of my code let me know and I will upload. The whole thing is around 170 lines.
currOps, files, and opsFile are vectors of strings
sclScale, rotScale, and locScale are vectors of doubles
currAxis is vectors of ints
Here are some general changes which may help:
I would convert the curOps to an enum rather than a string (save you the string comparisons.) Looks like you should pre-process that container and build a sequence of enums (then your code in the loop becomes a switch)
Don't use vector<string> for curOps, simply read the floats from the file and write the floats out - this will save you all those pointless conversions to and from string. If you wanted to take it further, convert the file to binary (if you are allowed by the exercise), and store a simple structure which contains the floats you need and use memory mapped files (you don't need boost for that, it's straight forward just using mmap!)
Before going down the mmap route - try the float read/write from file. For example, let's say that each "row" in your file corresponds to something like the following:
struct transform {
double x, y, z;
};
struct op {
transform scale, rot, loc;
};
Declare a bunch of stream in/out operators for these (for example:
std::ostream& operator<<(std::ostream& os, const transform& tx) {
return os << tx.x << ' ' << tx.y << ' ' << tx.z;
}
std::istream& operator>>(std::istream& is, transform& tx) {
return is >> tx.x >> tx.y >> tx.z;
}
(a similiar set will be required for op)
Now your vector is std::vector<op>, which you can easily stream in and out from your file, for example, to read:
op i;
while(file >> i) { curOps.push_back(i); }
To write:
for (auto o : curOps) file << o << '\n';
If this is still not enough, then mmap (btw - it's possible on windows too: Is there a memory mapping api on windows platform, just like mmap() on linux?)
Try using the functions in stdio.h instead. iostreams are terribly inefficient.
In your case all you will need is fseek() ... fputs(). Avoiding reopening the file each time should actually help quite a lot.

Why does Armadillo fail to learn a Gaussian Mixture Model and complain about 'no existing means' despite random_subset seeding?

QUESTION SUMMARY:
I have a [5 x 72580] matrix. I am trying to fit a Gaussian Mixture Model (GMM) to this data using the gmm_diag.learn() method with random_subset as the initial seeding mode. Why does Armadillo display "gmm_diag::learn(): no existing means" and fail to learn the model?
PROBLEM DETAILS:
I am working on a machine learning algorithm, the aim of which is to identify a writer from their handwriting. I am using supervised learning to train our model with a GMM.
All the training data is read from XML files. After calculating the features, their values are stored into a linked list. After this, the number of elements in the list is counted and is used to initialize an Armadillo mat(rix) variable at runtime as shown below:
int totFeatureVectors = CountPointClusterElements(TRAINING_CLUSTER_LIST_INDEX);
printf("\n%d elements added to list\n",totFeatureVectors);
mat F = mat(NUM_POINT_BASED_FEATURES, totFeatureVectors, fill::zeros);
Here TRAINING_CLUSTER_LIST_INDEX and NUM_POINT_BASED_FEATURES are a couple of configurable, project level constants; for my program NUM_POINT_BASED_FEATURES = 5 and totFeatureVectors = 72580. So the variable F is a [5 x 72580] dimensional matrix of double values. After initialization, I am reading the feature values from the linked list into F as follows:
int rowInd=0, colInd=0;
PointClusterElement *iterator = allClusterPointsList;
while(iterator!=NULL)
{
F(rowInd,colInd)=iterator->pointSample.speed;
rowInd += 1;
F(rowInd,colInd)=iterator->pointSample.dirn.cosComponent;
rowInd += 1;
F(rowInd,colInd)=iterator->pointSample.dirn.sinComponent;
rowInd += 1;
F(rowInd,colInd)=iterator->pointSample.curv.cosComponent;
rowInd += 1;
F(rowInd,colInd)=iterator->pointSample.curv.sinComponent;
rowInd += 1;
if(rowInd==NUM_POINT_BASED_FEATURES)
{
rowInd=0;
colInd += 1;
}
iterator=iterator->nextClusterElement;
}
The assignment of feature values to locations in F is being made in a column major manner i.e. each column of F represents a feature vector post assignment. I am even writing the values of F into a text file to verify that all the feature values have been properly set and yes, it is happening without any problems
FILE *fp = fopen(PROGRAM_DATA_OUTPUT_PATH,"w");
if(fp!=NULL)
{
int r,c;
for(c=0; c<totFeatureVectors; c++)
{
for(r=0; r<NUM_POINT_BASED_FEATURES; r++)
{
fprintf(fp,"%lf\t",F(r,c));
}
fprintf(fp,"\n");
}
}
fclose(fp);
So far, so good. But after this, when I declare a gmm_diag variable and try to fit a GMM to F using its learn() method, the program displays a warning "gmm_diag::learn(): no existing means" and quits, thus failing to learn the GMM (here the VARIANCE_FLOORING_FACTOR = 0.001)
gmm_diag writerModel;
bool result = writerModel.learn(F, 20, maha_dist, random_subset, 100, 100, VARIANCE_FLOORING_FACTOR, true);
writerModel.dcovs.print("covariances:\n");
writerModel.hefts.print("weights:\n");
writerModel.means.print("means:\n");
if(result==true)
{
printf("\nModel learnt");
}
else if(result==false)
{
printf("\nModel not learnt");
}
I opened up the learn() method on my IDE and from what I could make out, this error (warning) message is displayed only when the initial seeding mode is keep_existing. The source file I referred to is at /usr/include/armadillo_bits/gmm_diag_meat.hpp
My question is - why would this happen even when my seeding is done using the random_subset mode? How exactly should I proceed to get my model to learn? Not sure what I am missing here... The documentation and code samples provided at http://arma.sourceforge.net/docs.html#gmm_diag were not too helpful (the short program here works even without initializing the means of the GMM). The code is given below
int main(int argc, char** argv) {
int totFeatureVectors = CountPointClusterElements(TRAINING_CLUSTER_LIST_INDEX);
printf("\n%d elements added to list\n",totFeatureVectors);
mat F = mat(NUM_POINT_BASED_FEATURES, totFeatureVectors, fill::zeros);
int rowInd=0, colInd=0;
PointClusterElement *iterator = allClusterPointsList;
while(iterator!=NULL)
{
F(rowInd,colInd)=iterator->pointSample.speed;
rowInd += 1;
F(rowInd,colInd)=iterator->pointSample.dirn.cosComponent;
rowInd += 1;
F(rowInd,colInd)=iterator->pointSample.dirn.sinComponent;
rowInd += 1;
F(rowInd,colInd)=iterator->pointSample.curv.cosComponent;
rowInd += 1;
F(rowInd,colInd)=iterator->pointSample.curv.sinComponent;
rowInd += 1;
if(rowInd==NUM_POINT_BASED_FEATURES)
{
rowInd=0;
colInd += 1;
}
iterator=iterator->nextClusterElement;
}
FILE *fp = fopen(PROGRAM_DATA_OUTPUT_PATH,"w");
if(fp!=NULL)
{
int r,c;
for(c=0; c<totFeatureVectors; c++)
{
for(r=0; r<NUM_POINT_BASED_FEATURES; r++)
{
fprintf(fp,"%lf\t",F(r,c));
}
fprintf(fp,"\n");
}
}
fclose(fp);
gmm_diag writerModel;
bool result = writerModel.learn(F, 20, maha_dist, random_subset, 100, 100, VARIANCE_FLOORING_FACTOR, true);
writerModel.dcovs.print("covariances:\n");
writerModel.hefts.print("weights:\n");
writerModel.means.print("means:\n");
if(result==true)
{
printf("\nModel learnt");
}
else if(result==false)
{
printf("\nModel not learnt");
}
getchar();
return 0;}
TECHNICAL DETAILS:
The program is being run on a Ubuntu 14.04 OS using a Netbeans 8.0.2 IDE. The project is a C/C++ application
Any help would be most appreciated! Thanks in advance
~ Sid
You need to try the simplest possible case first, in order to narrow down the location of the bug. Your code is certainly not simple, and it's also not reproducible (nobody except you has all the functions).
The following simple code works, which suggests that the bug is somewhere else in your code.
I suspect your code is overwriting memory somewhere, leading to data and/or code corruption. The bug is probably an incorrect pointer, or incorrectly used pointer.
#include <fstream>
#include <armadillo>
using namespace std;
using namespace arma;
int main(int argc, char** argv) {
mat F(5,72580, fill::randu);
gmm_diag model;
bool result = model.learn(F, 20, maha_dist, random_subset, 100, 100, 0.001, true);
model.hefts.print("hefts:");
model.means.print("means:");
model.dcovs.print("dcovs:");
return 0;
}
Output from above code:
gmm_diag::learn(): generating initial means
gmm_diag::learn(): k-means: iteration: 1 delta: 0.343504
gmm_diag::learn(): k-means: iteration: 2 delta: 0.0528804
...
gmm_diag::learn(): k-means: iteration: 100 delta: 3.02294e-06
gmm_diag::learn(): generating initial covariances
gmm_diag::learn(): EM: iteration: 1 avg_log_p: -0.624274
gmm_diag::learn(): EM: iteration: 2 avg_log_p: -0.586567
...
gmm_diag::learn(): EM: iteration: 100 avg_log_p: -0.472182
hefts:
0.0915 0.0335 0.0308 ...
means:
0.4677 0.1230 0.8582 ...
...
dcovs:
0.0474 0.0059 0.0080 ...
...
That branch can only be taken in the Armadillo code when seed_mode is equal to keep_existing, and the means matrix is empty. Why not simply do source-level debugging using your IDE and see at what point it's changing out from under you?

I can't seem to print several lines of data in qt creator. Program overwrites all output except for the last line

My intended output: To print the time and number of bacteria for an exponential equation. I'm trying to print every data point up until time t, for instance if I'm finding the growth up until 50 hours in, I want to print the number of bacteria at time 0, 1, 2, ..., 49, 50. I am trying to have each output on a new line as well.
So here is my code:
void MainWindow::on_pushButtonCalc_clicked()
{
QString s;
double t = ui->t->text().toDouble();
double k = ui->k->text().toDouble();
double n0 = ui->n0->text().toDouble();
/*double example;
example= k;
s = s.number(example);
ui->textOutput->setText(s);*/
for(int c = 0; c<t; ++c)
{
double nt = n0*exp(k*t);
s = s.number(nt);
ui->textOutput->setText(s);
}
}
I've tried quite a few different outputs, and have also been trying to append new points to an array and print the array, but I haven't had too much luck with that either. I'm somewhat new to c++, and very new to qt.
Thank you for any suggestions.
The QTextEdit::setText function is going to replace the contents of the text edit with the parameter you pass in. Instead, you can use the append function:
for(int c = 0; c<t; ++c)
{
double nt = n0*exp(k*t);
s = QString::number(nt);
ui->textOutput->append(s);
}
Note also that since QString::number is a static function, you don't need an instance to call it.
Alternately, you can create the string in your loop and then set it to the text edit using setText:
for (int c = 0; c<t; ++c)
{
double nt = n0*exp(k*t);
s += QString("%1 ").arg(nt);
}
ui->textOutput->setText(s);

Where did these .tmp files come from?

Firstly, some details:
I am using a combination of C++ (Armadillo library) and R.
I am using Ubuntu as my operating system.
I am not using Rcpp
Suppose that I have some C++ code called cpp_code which:
Reads, as input from R, an integer,
Performs some calculations,
Saves, as output to R, a spreadsheet "out.csv". (I use .save( name, file_type = csv))
Some simplified R code would be:
for(i in 1:10000)
{
system(paste0("echo ", toString(i), " | ./cpp_code")) ## produces out.csv
output[i,,] <- read.csv("out.csv") ## reads out.csv
}
My Problem:
99% of the time, everything works fine. However, every now and then, I keep getting some unusual .tmp files like: "out.csv.tmp_a0ac9806ff7f0000703a". These .tmp files only appear for a second or so, then suddenly disappear.
Questions:
What is causing this?
Is there a way to stop this from happening?
Please go easy on me since computing is not my main discipline.
Thank you very much for your time.
Many programs write their output to a temporary file, then rename it to the destination file. This is often done to avoid leaving a half-written output file if the process is killed while writing. By using a temporary, the file can be atomically renamed to the output file name ensuring either:
the entire output file is properly written or
no change is made to the output file
Note there usually are still some race conditions that could result, for example, in the output file being deleted but the temporary file not renamed, but one of the two outcomes above is the general goal.
I believe you're using .save function in armadillo.
http://arma.sourceforge.net/docs.html#save_load_field
There are two functions you should see in
include/armadillo_bits/diskio_meat.hpp. In save_raw_ascii, it first stores data to the filename from diskio::gen_tmp_name, and if save_okay, rename by safe_rename. If safe_okay or safe_rename failed, you will see temporary file. The temporary file name is chosen as filename + .tmp_ + some hex value from file name.
//! Save a matrix as raw text (no header, human readable).
//! Matrices can be loaded in Matlab and Octave, as long as they don't have complex elements.
template<typename eT>
inline
bool
diskio::save_raw_ascii(const Mat<eT>& x, const std::string& final_name)
{
arma_extra_debug_sigprint();
const std::string tmp_name = diskio::gen_tmp_name(final_name);
std::fstream f(tmp_name.c_str(), std::fstream::out);
bool save_okay = f.is_open();
if(save_okay == true)
{
save_okay = diskio::save_raw_ascii(x, f);
f.flush();
f.close();
if(save_okay == true)
{
save_okay = diskio::safe_rename(tmp_name, final_name);
}
}
return save_okay;
}
//! Append a quasi-random string to the given filename.
//! The rand() function is deliberately not used,
//! as rand() has an internal state that changes
//! from call to call. Such states should not be
//! modified in scientific applications, where the
//! results should be reproducable and not affected
//! by saving data.
inline
std::string
diskio::gen_tmp_name(const std::string& x)
{
const std::string* ptr_x = &x;
const u8* ptr_ptr_x = reinterpret_cast<const u8*>(&ptr_x);
const char* extra = ".tmp_";
const uword extra_size = 5;
const uword tmp_size = 2*sizeof(u8*) + 2*2;
char tmp[tmp_size];
uword char_count = 0;
for(uword i=0; i<sizeof(u8*); ++i)
{
conv_to_hex(&tmp[char_count], ptr_ptr_x[i]);
char_count += 2;
}
const uword x_size = static_cast<uword>(x.size());
u8 sum = 0;
for(uword i=0; i<x_size; ++i)
{
sum += u8(x[i]);
}
conv_to_hex(&tmp[char_count], sum);
char_count += 2;
conv_to_hex(&tmp[char_count], u8(x_size));
std::string out;
out.resize(x_size + extra_size + tmp_size);
for(uword i=0; i<x_size; ++i)
{
out[i] = x[i];
}
for(uword i=0; i<extra_size; ++i)
{
out[x_size + i] = extra[i];
}
for(uword i=0; i<tmp_size; ++i)
{
out[x_size + extra_size + i] = tmp[i];
}
return out;
}
What “Dark Falcon” hypothesises is exactly true: when calling save, Armadillo creates a temporary file to which it saves the data, and then renames the file to the final name.
This can be seen in the source code (this is save_raw_ascii but the other save* functions work analogously):
const std::string tmp_name = diskio::gen_tmp_name(final_name);
std::fstream f(tmp_name.c_str(), std::fstream::out);
bool save_okay = f.is_open();
if(save_okay == true)
{
save_okay = diskio::save_raw_ascii(x, f);
f.flush();
f.close();
if(save_okay == true)
{
save_okay = diskio::safe_rename(tmp_name, final_name);
}
}
The comment on safe_rename says this:
Safely rename a file.
Before renaming, test if we can write to the final file.
This should prevent:
overwriting files that are write protected,
overwriting directories.
It’s worth noting that this will however not prevent a race condition.