How can I read numbers from a file in C++? - c++

My main question is about how you read data from a file that is not of the char data type.
I am writing a file of data from MATLAB as follows:
x=rand(1,60000);
fID=fopen('Data.txt','w');
fwrite(fID,x,'float');
fclose(fID);
Then when I try to read it in C++ using the following code "num" doesn't change.
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
fstream fin("Data.txt",ios::in | ios::binary);
if (!fin)
{
cout<<"\n Couldn't find file \n";
return 0;
}
float num=123;
float loopSize=100e3;
for(int i=0; i<loopSize; i++)
{
if(fin.eof())
break;
fin >> num;
cout<< num;
}
fin.close();
return 0;
}
I can read and write file in matlab fine, and I can read and write in c++, but I can't write in matlab and read in c++. The files I write in matlab are in the format I want, but the files in c++ seem to be writing/reading the numbers out at text. How do you read a series of floats in from a file in C++, or what am I doing wrong?
edit: The loop code is messy because I didn't want an infinite loop and the eof flag was never being set.

Formatted I/O using << and >> does indeed read and write numeric values as text.
Presumably, Matlab is writing the floating-point values in a binary format. If it uses the same format as C++ (most implementations of which use the standard IEEE binary format), then you could read the bytes using unformatted input, and reinterpret them as a floating-point value, along the lines of:
float f; // Might need to be "double", depending on format
fin.read(reinterpret_cast<char*>(&f), sizeof f);
If Matlab does not use a compatible format, then you'll need to find out what format it does use and write some code to convert it.

You need to read and write the same format. For that matter, what you have written from Matlab is an unformatted sequence of bytes which may or may not be able read depending on whether you use the same system. You can probably read this unformatted sequence of bytes into a C++ program (e.g. using std::istream::read()) but you shouldn't consider the data to be stored.
To actually store data, you need to be aware of the format the data has. The format can be binary or text but you should be clear about what the bytes mean, in which order they appear, how many there are or how to detect the end if a value, etc.

Using fwrite is not the best idea, because this will write out the data in an internal format, which might or might not be easy to read back in your program.
Matlab has other ways of writing output, e.g. functions like fprintf. Better write out your data this way, then it should be obvious how to read it back into another application.
Just use fprintf(fID, "%f\n", x), and then you should be able to use scanf to read this back in C/C++.

Related

c++ Writing binary to a file

HI im trying to write to a txt file in binary.
now i wrote this code:
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
char* f = "abc";
ofstream ofile("D:\\foobar.txt", ios_base::out | ios_base::binary);
ofile.write(f, sizeof(char*));
return 0;
}
now it writes "abc" but not in binary.
can someone please tell me how to write it in binary.
First of all, you write the wrong size and might go out of bounds. Use strlen instead to get the length of the string.
Secondly, think about how a letter like 'a' is stored in memory in the computer. It's is stored in whatever encoding the compiler and operating system uses, which is most likely ASCII. When you write that letter to a file it will write the value stored in memory to the file, and if you read the file using a program which is able to decipher the encoding it will show you the letter.
I'm just guessing here, but I think you expected binary format to write actual ones and zeroes as text. Well you do write ones and zeroes, not as text but as individual bits. And when all those bits are put together into bytes you get the bytes as they are stored in memory. If you look at the file in a hex-editor you will the the actual values, and you might even be able to find a program that shows you the actual binary values as ones and zeros.

Are C++ << and >> operators slow? What alternatives are there to these operators?

I'm doing a project for college and I'm using C++. I used std::cin and std::cout with the << and >> operators to read input and to display output. My professor has published an announcement saying that >> and << are not recommended because they are slow.
We only have to read integers and the input is always correct (we don't need to verify it, we know the format it is in and just need to read it). What alternatives should we use then, if << and >> are not recommended?
For cout you can use put or write
// single character
char character;
cout.put(character);
// c string
char * buffer = new char[size];
cout.write(buffer, size);
For cin you could use get, read, or getline
// Single character
char ch;
std::cin.get(ch);
// c string
char * buffer = new char[size];
std::cin.read(buffer, size);
std::cin.get(buffer, size);
std::cin.getline(buffer, size);
Worrying about the speed of the stream extraction operators (<< and >>) in C++ is something to do when you have lots of data to process (over 1E06 items). For smaller sets of data, the execution time is negligible to other factors with the computer and your program.
Before you worry about the speed of formatted I/O, get your program working correctly. Review your algorithms for efficiency. Review your implementation of the algorithms for efficiency. Review the data for efficiency.
The slowness of the stream extraction operators is first translating from textual representation to internal representation, then the implementation. Heck, if you are typing in the data, forget about any optimizations. To speed up your file reading, organize the data for easy extraction and translation.
If you are still panicking about efficiency, use binary file representation. The data in the file should be formatted so that it can be loaded directly into memory without any translations. Also, the data should be loaded in large chunks.
From the Hitchhiker's Guide to the Galaxy, DON'T PANIC.

Writing huge txt files without overloading RAM

I need to write the results of a process in a txt file. The process is very long and the amount of data to be written is huge (~150Gb). The program works fine, but the problem is that the RAM gets overloaded and, at a certain point, it just stops.
The program is simple:
ostream f;
f.open(filePath);
for(int k=0; k<nDataset; k++){
//treat element of dataset
f << result;
}
f.close();
Is there a way of writing this file without overloading the memory?
You should flush output periodically.
For example:
if (k%10000 == 0) f.flush();
I'd like to suggest something like this
ogzstream f;
f.open(filePath);
string s("");
for(int k=0; k<nDataset; k++){
//treat element of dataset
s.append(result);
if (s.length() == OPTIMUM_BUFFER_SIZE) {
f << s;
f.flush();
s.clear();
}
}
f << s;
f.flush();
f.close();
Basically, you construct the stream in memory rather than redirecting to the stream so you don't have to worry about when the stream gets flushed. And when you are redirecting you ensure it's flushed to the actual file. Some ideas for the OPTIMUM_BUFFER_SIZE can be found from here and here.
I'm not exactly sure whether string or vector is the best option for the buffer. Will do some research myself and update the answer or you can refer to Effective STL by Scott Meyers.
If that truly is the code where your program gets stuck, then your explanation of the problem is wrong.
There's no text file. Your igzstream is not dealing with text, but a gzip archive.
There's no data being written. The code you show reads from the stream.
I don't know what your program does with result, because you didn't show that. But if it accumulates results into a collection in memory, that will grow. You'll need to find a way to process all your data without loading all of it into RAM at the same time.
Your memory usage could be from the decompressor. For some compression algorithms, an entire block has to be stored in memory. In such cases it's best to break the file into blocks and compress each separately (possibly pre-initializing a dictionary with the results of the previous block). I don't think that gzip is such an algorithm, however. You may need to find a library that supports streaming.

How to exchange data between C++ and MATLAB?

For the now being I am developing a C++ program based on some MATLAB codes. During the developing period I need to output the intermediate results to MATLAB in order to compare the C++ implementation result with the MATLAB result. What I am doing now is to write a binary file with C++, and then load the binary file with MATLAB. The following codes show an example:
int main ()
{
ofstream abcdef;
abcdef.open("C:/test.bin",ios::out | ios::trunc | ios::binary);
for (int i=0; i<10; i++)
{
float x_cord;
x_cord = i*1.38;
float y_cord;
y_cord = i*10;
abcdef<<x_cord<<" "<<y_cord<<endl;
}
abcdef.close();
return 0;
}
When I have the file test.bin, I can load the file automatically with MATLAB command:
data = load('test.bin');
This method can work well when numerical data is the output; however, it could fail if the output is a class with many member variables. I was wondering whether there are better ways to do the job not only for simple numerical data but also for complicated data structure. Thanks!
I would suggest the use of MATLAB engine through which you can pass data to MATLAB on real time basis and can even visualize the data using various graph plotting facilities available in MATLAB.
All you have to do is to invoke the MATLAB engine from C/C++ program and then you can easily execute MATLAB commands directly from the C/C++ program and/or exchange data between MATLAB and C/C++. It can be done in both directions i.e. from C++ to MATLAB and vice versa.
You can have a look at a working example for the same as shown here.
I would suggest using the fread command in matlab. I do this all the time for exchanging data between matlab and other programs, for instance:
fd = fopen('datafile.bin','r');
a = fread(fd,3,'*uint32');
b = fread(fd,1,'float32');
With fread you have all the flexibility to read any type of data. By placing a * in the name, as above, you also say that you want to store into that data type instead of the default matlab data type. So the first one reads in 3 32 bit unsigned integers and stores them as integers. The second one reads in a single precision floating point number, but stores it as the default double precision.
You need to control the way that data is written in your c++ code, but that is inevitable. You can make a class method in c++ that packs the data in a deterministic way.
Dustin

How to perform fast formatted input from a stream in C++?

The situation is: there is a file with 14 294 508 unsigned integers and 13 994 397 floating-point numbers (need to read doubles). Total file size is ~250 MB.
Using std::istream takes ~30sec. Reading the data from file to memory (just copying bytes, without formatted input) is much faster. Is there any way to improve reading speed without changing file format?
Do you need to use STL style i/o? You must check out this excellent piece of work from one of the experts. It's a specialized iostream by Dietmar Kuhl.
I hate to suggest this but take a look at the C formatted i/o routines. Also, are you reading in the whole file in one go?
You might also want to look at Matthew Wilson's FastFormat library:
http://www.fastformat.org/
I haven't used it, but he makes some pretty impressive claims and I've found a lot of his other work to be worth studying and using (and stealing on occasion).
You haven't specified the format. It's possible that you could memory map it, or could read in very large chunks and process in a batch algorithm.
Also, you haven't said whether you know for sure that the file and process that will read it will be on the same platform. If a big-endian process writes it and a little-endian process reads it, or vice versa, it won;t work.
Parsing input by yourself (atoi & atof), usually boosts speed at least twice, compared to "universal" read methods.
Something quick and dirty is to just dump the file into a standard C++ string, and then use a stringstream on it:
#include <sstream>
// Load file into string file_string
std::stringstream s( file_string );
int x; float y;
s >> x >> y;
This may not give you much of a performance improvement (you will get a larger speed-up by avoiding iostreams), but it's very easy to try, and it may be faster enough.