Writing integer to binary file using C++? - c++

I have a very simple question, which happens to be hard for me since this is the first time I tried working with binary files, and I don't quite understand them. All I want to do is write an integer to a binary file.
Here is how I did it:
#include <fstream>
using namespace std;
int main () {
int num=162;
ofstream file ("file.bin", ios::binary);
file.write ((char *)&num, sizeof(num));
file.close ();
return 0;
}
Could you please tell me if I did something wrong, and what?
The part that is giving me trouble is line with file.write, I don't understand it.
Thank you in advance.

The part that is giving me trouble is line with file.write, I don't
understand it.
If you read the documentation of ofstream.write() method, you'll see that it requests two arguments:
a pointer to a block of data with the content to be written;
an integer value representing the size, in bytes, of this block.
This statement just gives these two pieces of information to ofstream.write():
file.write(reinterpret_cast<const char *>(&num), sizeof(num));
&num is the address of the block of data (in this case just an integer variable), sizeof(num) is the size of this block (e.g. 4 bytes on 32-bit platforms).

Related

Type casting pointer when writing to binary file in C++

Today in CS 111 class my instructor ended with a 'brief' look at writing structures to binary files. I say brief because he just included it as a kind of aside, saying it would not be on the final. Problem is, I don't fully understand what's going on in the program example and it is bothering me. Hopefully somebody will take the time to explain it to me. The code is as follows:
#include <iostream>
#include <fstream>
using namespace std;
struct PayStub
{
int id_num;
bool overtime;
float hourly_rate;
};
int main()
{
PayStub info = {1234, false, 15.45};
ofstream data_store;
data_store.open("test.cs111", ios::binary);
char *raw_data = (char*)&info;
data_store.write(raw_data, sizeof(PayStub));
data_store.close();
return 0;
}
I don't understand what is going on specifically in the statement char *raw_data = (char*)&info; and why it is necessary. I understand a pointer to a char is being declared and initialized, but what exactly is it being initialized to, and how is that being used in the next line?
I hope this isn't a stupid question. Thanks in advance for your help.
char *raw_data = (char*)&info; after this line, raw_data will point to the address of the first byte of info.
With data_store.write(raw_data, sizeof(PayStub)); we ask data_store to write to the file the contents in memory that start at raw_data and end at raw_data + sizeof(PayStub)).
In essence we find the start address and length of PayStub and write it to disk.
It is not a stupid question. Once you read up on pointers, everything will make sense.
Think of a struct (in plain C) to simply be a way of binding a bunch of objects together with "string".
An int could be represented in memory with 4 bytes. A struct composed of two ints would be 8 bytes, one int next to the other.
In your example code, &info returns the pointer to the beginning of the object in memory, and (char*)&info simply interprets as a pointer to a character instead, so it can be treated as a sequence of binary data. sizeof returns how much memory (in bytes) the struct takes, and with this information the structure is then written directly to a file from memory.
Keep in mind this type of data storage is absolutely not portable. It might vary from a 32-bit to a 64-bit computer!

C++: read/write binary files. What's happening in this code?

I've been struggling with the concept of unformatted I/O. My course textbook doesn't explain it well. It gives the small program but I don't know what is happening here. If someone could explain it to me, I would appreciate it
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main() {
const unsigned int size = 10;
int arr[size];
ifstream infile("small.ppm");
infile.read(reinterpret_cast<char *>(&arr), size * sizeof(arr[0]));
infile.close();
ofstream outfile("newfile.ppm");
outfile.write((char *)&arr, size * sizeof(arr[0]));
outfile.close();
}
What do the read() and write() functions do exactly. I understand that they must
take in (char *, buffer_size) as arguments, but what do the functions themselves do?
Also, once I read in the data with read(), how do I store that data and manipulate it?
Sorry that this is such a long question. I've been struggling with this concept for a while now. Thanks a lot for your help
but what do the functions themselves do?
These are writing/reading the data as a byte by byte copy to/from the binary file.
Also, once I read in the data with read(), how do I store that data and manipulate it?
You already stored a int arr[size];. You can manipulate that data using that int array.
Pitfalls:
If that data was serialized on a different machine, you'll notice endianess issues regarding the machine specific int representation.

Parsing binary data from file

and thank you in advance for your help!
I am in the process of learning C++. My first project is to write a parser for a binary-file format we use at my lab. I was able to get a parser working fairly easily in Matlab using "fread", and it looks like that may work for what I am trying to do in C++. But from what I've read, it seems that using an ifstream is the recommended way.
My question is two-fold. First, what, exactly, are the advantages of using ifstream over fread?
Second, how can I use ifstream to solve my problem? Here's what I'm trying to do. I have a binary file containing a structured set of ints, floats, and 64-bit ints. There are 8 data fields all told, and I'd like to read each into its own array.
The structure of the data is as follows, in repeated 288-byte blocks:
Bytes 0-3: int
Bytes 4-7: int
Bytes 8-11: float
Bytes 12-15: float
Bytes 16-19: float
Bytes 20-23: float
Bytes 24-31: int64
Bytes 32-287: 64x float
I am able to read the file into memory as a char * array, with the fstream read command:
char * buffer;
ifstream datafile (filename,ios::in|ios::binary|ios::ate);
datafile.read (buffer, filesize); // Filesize in bytes
So, from what I understand, I now have a pointer to an array called "buffer". If I were to call buffer[0], I should get a 1-byte memory address, right? (Instead, I'm getting a seg fault.)
What I now need to do really ought to be very simple. After executing the above ifstream code, I should have a fairly long buffer populated with a number of 1's and 0's. I just want to be able to read this stuff from memory, 32-bits at a time, casting as integers or floats depending on which 4-byte block I'm currently working on.
For example, if the binary file contained N 288-byte blocks of data, each array I extract should have N members each. (With the exception of the last array, which will have 64N members.)
Since I have the binary data in memory, I basically just want to read from buffer, one 32-bit number at a time, and place the resulting value in the appropriate array.
Lastly - can I access multiple array positions at a time, a la Matlab? (e.g. array(3:5) -> [1,2,1] for array = [3,4,1,2,1])
Firstly, the advantage of using iostreams, and in particular file streams, relates to resource management. Automatic file stream variables will be closed and cleaned up when they go out of scope, rather than having to manually clean them up with fclose. This is important if other code in the same scope can throw exceptions.
Secondly, one possible way to address this type of problem is to simply define the stream insertion and extraction operators in an appropriate manner. In this case, because you have a composite type, you need to help the compiler by telling it not to add padding bytes inside the type. The following code should work on gcc and microsoft compilers.
#pragma pack(1)
struct MyData
{
int i0;
int i1;
float f0;
float f1;
float f2;
float f3;
uint64_t ui0;
float f4[64];
};
#pragma pop(1)
std::istream& operator>>( std::istream& is, MyData& data ) {
is.read( reinterpret_cast<char*>(&data), sizeof(data) );
return is;
}
std::ostream& operator<<( std::ostream& os, const MyData& data ) {
os.write( reinterpret_cast<const char*>(&data), sizeof(data) );
return os;
}
char * buffer;
ifstream datafile (filename,ios::in|ios::binary|ios::ate);
datafile.read (buffer, filesize); // Filesize in bytes
you need to allocate a buffer first before you read into it:
buffer = new filesize[filesize];
datafile.read (buffer, filesize);
as to the advantages of ifstream, well it is a matter of abstraction. You can abstract the contents of your file in a more convenient way. You then do not have to work with buffers but instead can create the structure using classes and then hide the details about how it is stored in the file by overloading the << operator for instance.
You might perhaps look for serialization libraries for C++. Perhaps s11n might be useful.
This question shows how you can convert data from a buffer to a certain type. In general, you should prefer using a std::vector<char> as your buffer. This would then look like this:
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
int main() {
std::ifstream input("your_file.dat");
std::vector<char> buffer;
std::copy(std::istreambuf_iterator<char>(input),
std::istreambuf_iterator<char>(),
std::back_inserter(buffer));
}
This code will read the entire file into your buffer. The next thing you'd want to do is to write your data into valarrays (for the selection you want). valarray is constant in size, so you have to be able to calculate the required size of your array up-front. This should do it for your format:
std::valarray array1(buffer.size()/288); // each entry takes up 288 bytes
Then you'd use a normal for-loop to insert the elements into your arrays:
for(int i = 0; i < buffer.size()/288; i++) {
array1[i] = *(reinterpret_cast<int *>(buffer[i*288])); // first position
array2[i] = *(reinterpret_cast<int *>(buffer[i*288]+4)); // second position
}
Note that on a 64-bit system this is unlikely to work as you expect, because an integer would take up 8 bytes there. This question explains a bit about C++ and sizes of types.
The selection you describe there can be achieved using valarray.

C++ and binary files - newbie question

I have the following code and i am trying to write some data in a binary file.
The problem is that i don't have any experience with binary files and i cant understand what i am doing.
#include <iostream>
#include <fstream>
#include <string>
#define RPF 5
using namespace std;
int write_header(int h_len, ofstream& f)
{
int h;
for(h=0;h<h_len;h++)
{
int num = 0;
f.write((char*)&num,sizeof(char));
}
return 0;
}
int new_file(const char* name)
{
ofstream n_file(name,ofstream::binary);
write_header(RPF,n_file);
n_file.close();
return 0;
}
int main(int argc, char **argv)
{
ofstream file("file.dat",ofstream::binary);
file.seekp(10);
file.write("this is a message",3);
new_file("file1.dat");
cin.get();
return 0;
}
1. As you can see i am opening file.dat and writing inside the word "thi". Then i open the file and i see the ASCII value of it. Why does this happen?
Then i make a new file file1.dat and i try to write in it the number 0 five times.
What i am supposed to use?
this
f.write((char*)&num,sizeof(char));
or this
f.write((char*)&num,sizeof(int));
and why i cant write the value of the number as is and i have to cast it as a char*?
Is this because write() works like this or i am able to write only chars to a binary file?
Can anyone help me understand what's happening?
Function write() that a pointer to your data buffer and the length in bytes of the data to be streamed to the file. So when you say
file.write("this is a message",3);
you tell the write function to write 3 bytes in the file. And that is "thi".
This
f.write((char*)&num,sizeof(char));
tells the write function to put sizeof(char) bytes in the file. That is 1 byte. You probably want it
f.write((char*)&num,sizeof(int));
as num is a int variable.
You are writing the ASCII string "thi" to file.dat. If you opened the file in a hex editor, you would see "74 68 69", which is the numeric representations of those characters. But if you open file.dat in an editor that understands ASCII, it will most likely translate those values back to their ASCII representation to make it easier to view. Opening the ofstream in ios::binary mode means that data is output to file as is, and no transformations may be applied by the stream before hand.
The function ofstream::write(const char *data, streamsize len) has two parameters. data is like this so that write is operating on individual bytes. That is why you have to cast num to a char* first. The second parameter, len, indicates how many bytes, starting from data that will be written to the file. My advise would be to use write(static_cast<char*>(num), sizeof(num)), then set num to be a type big enough to store the data required. If you declare int num, then on a 32bit platform, 20 zero bytes would be written to the file. If you only want 5 zero bytes, then declare as char num.

How to speed-up loading of 15M integers from file stream?

I have an array of precomputed integers, it's fixed size of 15M values. I need to load these values at the program start. Currently it takes up to 2 mins to load, file size is ~130MB. Is it any way to speed-up loading. I'm free to change save process as well.
std::array<int, 15000000> keys;
std::string config = "config.dat";
// how array is saved
std::ofstream out(config.c_str());
std::copy(keys.cbegin(), keys.cend(),
std::ostream_iterator<int>(out, "\n"));
// load of array
std::ifstream in(config.c_str());
std::copy(std::istream_iterator<int>(in),
std::istream_iterator<int>(), keys.begin());
in_ranks.close();
Thanks in advance.
SOLVED. Used the approach proposed in accepted answer. Now it takes just a blink.
Thanks all for your insights.
You have two issues regarding the speed of your write and read operations.
First, std::copy cannot do a block copy optimization when writing to an output_iterator because it doesn't have direct access to underlying target.
Second, you're writing the integers out as ascii and not binary, so for each iteration of your write output_iterator is creating an ascii representation of your int and on read it has to parse the text back into integers. I believe this is the brunt of your performance issue.
The raw storage of your array (assuming a 4 byte int) should only be 60MB, but since each character of an integer in ascii is 1 byte any ints with more than 4 characters are going to be larger than the binary storage, hence your 130MB file.
There is not an easy way to solve your speed problem portably (so that the file can be read on different endian or int sized machines) or when using std::copy. The easiest way is to just dump the whole of the array to disk and then read it all back using fstream.write and read, just remember that it's not strictly portable.
To write:
std::fstream out(config.c_str(), ios::out | ios::binary);
out.write( keys.data(), keys.size() * sizeof(int) );
And to read:
std::fstream in(config.c_str(), ios::in | ios::binary);
in.read( keys.data(), keys.size() * sizeof(int) );
----Update----
If you are really concerned about portability you could easily use a portable format (like your initial ascii version) in your distribution artifacts then when the program is first run it could convert that portable format to a locally optimized version for use during subsequent executions.
Something like this perhaps:
std::array<int, 15000000> keys;
// data.txt are the ascii values and data.bin is the binary version
if(!file_exists("data.bin")) {
std::ifstream in("data.txt");
std::copy(std::istream_iterator<int>(in),
std::istream_iterator<int>(), keys.begin());
in.close();
std::fstream out("data.bin", ios::out | ios::binary);
out.write( keys.data(), keys.size() * sizeof(int) );
} else {
std::fstream in("data.bin", ios::in | ios::binary);
in.read( keys.data(), keys.size() * sizeof(int) );
}
If you have an install process this preprocessing could also be done at that time...
Attention. Reality check ahead:
Reading integers from a large text file is an IO bound operation unless you're doing something completely wrong (like using C++ streams for this). Loading 15M integers from a text file takes less than 2 seconds on an AMD64#3GHZ when the file is already buffered (and only a bit long if had to be fetched from a sufficiently fast disk). Here's a quick & dirty routine to prove my point (that's why I do not check for all possible errors in the format of the integers, nor close my files at the end, because I exit() anyway).
$ wc nums.txt
15000000 15000000 156979060 nums.txt
$ head -n 5 nums.txt
730547560
-226810937
607950954
640895092
884005970
$ g++ -O2 read.cc
$ time ./a.out <nums.txt
=>1752547657
real 0m1.781s
user 0m1.651s
sys 0m0.114s
$ cat read.cc
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <vector>
int main()
{
char c;
int num=0;
int pos=1;
int line=1;
std::vector<int> res;
while(c=getchar(),c!=EOF)
{
if (c>='0' && c<='9')
num=num*10+c-'0';
else if (c=='-')
pos=0;
else if (c=='\n')
{
res.push_back(pos?num:-num);
num=0;
pos=1;
line++;
}
else
{
printf("I've got a problem with this file at line %d\n",line);
exit(1);
}
}
// make sure the optimizer does not throw vector away, also a check.
unsigned sum=0;
for (int i=0;i<res.size();i++)
{
sum=sum+(unsigned)res[i];
}
printf("=>%d\n",sum);
}
UPDATE: and here's my result when read the text file (not binary) using mmap:
$ g++ -O2 mread.cc
$ time ./a.out nums.txt
=>1752547657
real 0m0.559s
user 0m0.478s
sys 0m0.081s
code's on pastebin:
http://pastebin.com/NgqFa11k
What do I suggest
1-2 seconds is a realistic lower bound for a typical desktop machine for load this data. 2 minutes sounds more like a 60 Mhz micro controller reading from a cheap SD card. So either you have an undetected/unmentioned hardware condition or your implementation of C++ stream is somehow broken or unusable. I suggest to establish a lower bound for this task on your your machine by running my sample code.
if the integers are saved in binary format and you're not concerned with Endian problems, try reading the entire file into memory at once (fread) and cast the pointer to int *
You could precompile the array into a .o file, which wouldn't need to be recompiled unless the data changes.
thedata.hpp:
static const int NUM_ENTRIES = 5;
extern int thedata[NUM_ENTRIES];
thedata.cpp:
#include "thedata.hpp"
int thedata[NUM_ENTRIES] = {
10
,200
,3000
,40000
,500000
};
To compile this:
# make thedata.o
Then your main application would look something like:
#include "thedata.hpp"
using namespace std;
int main() {
for (int i=0; i<NUM_ENTRIES; i++) {
cout << thedata[i] << endl;
}
}
Assuming the data doesn't change often, and that you can process the data to create thedata.cpp, then this is effectively instant loadtime. I don't know if the compiler would choke on such a large literal array though!
Save the file in a binary format.
Write the file by taking a pointer to the start of your int array and convert it to a char pointer. Then write the 15000000*sizeof(int) chars to the file.
And when you read the file, do the same in reverse: read the file as a sequence of chars, take a pointer to the beginning of the sequence, and convert it to an int*.
of course, this assumes that endianness isn't an issue.
For actually reading and writing the file, memory mapping is probably the most sensible approach.
If the numbers never change, preprocess the file into a C++ source and compile it into the application.
If the number can change and thus you have to keep them in separate file that you have to load on startup then avoid doing that number by number using C++ IO streams. C++ IO streams are nice abstraction but there is too much of it for such simple task as loading a bunch of number fast. In my experience, huge part of the run time is spent in parsing the numbers and another in accessing the file char by char.
(Assuming your file is more than single long line.) Read the file line by line using std::getline(), parse numbers out of each line using not streams but std::strtol(). This avoids huge part of the overhead. You can get more speed out of the streams by crafting your own variant of std::getline(), such that reads the input ahead (using istream::read()); standard std::getline() also reads input char by char.
Use a buffer of 1000 (or even 15M, you can modify this size as you please) integers, not integer after integer. Not using a buffer is clearly the problem in my opinion.
If the data in the file is binary and you don't have to worry about endianess, and you're on a system that supports it, use the mmap system call. See this article on IBM's website:
High-performance network programming, Part 2: Speed up processing at both the client and server
Also see this SO post:
When should I use mmap for file access?