C++ how to store integer into a binary file? - c++

I've got a struct with 2 integers, and I want to store them in a binary file and read it again.
Here is my code:
static const char *ADMIN_FILE = "admin.bin";
struct pw {
int a;
int b;
};
void main(){
pw* p = new pw();
pw* q = new pw();
std::ofstream fout(ADMIN_FILE, ios_base::out | ios_base::binary | ios_base::trunc);
std::ifstream fin(ADMIN_FILE, ios_base::in | ios_base::binary);
p->a=123;
p->b=321;
fout.write((const char*)p, sizeof(pw));
fin.read((char*)q, sizeof(pw));
fin.close();
cout << q->a << endl;
}
The output I get is 0. Can anyone tell me what is the problem?

You probably want to flush fout before you read from it.
To flush the stream, do the following:
fout.flush();
The reason for this is that fstreams generally want to buffer the output as long as possible to reduce cost. To force the buffer to be emptied, you call flush on the stream.

When storing integers to files, you can use the htonl(), ntohl() family of functions to ensure that they will be read back in the correct format regardless of whether the file is written out on a big-endian machine, and read back later on a small-endian machine. The functions were intended for network use, but can be valuable when writing to files.

fin.write((char*)q, sizeof(pw));
Should probably be
fin.read((char*)q, sizeof(pw));

Be warned that your method assumes things about the size and endianness of your integers and the packing of your structures, none of which is necessarily going to be true if your code gets ported to another machine.
For portability reasons, you want to have output routines that output the fields of structures separately, and that output numbers at specific bitwidths with specific endianness. This is why there are serialization packages.

try this:
fout.write((const char*)&p, sizeof(pw));
fin.read((char*)&q, sizeof(pw));
instead of
fout.write((const char*)p, sizeof(pw));
fin.read((char*)q, sizeof(pw));
vagothcpp (yournotsosmartc++programmer=p)

Related

Why some data in binary file is shown as it is and other is shown in a strange way

I have code, which writes vector of such structures to a binary file:
struct reader{
char name[50];
int card_num;
char title[100];
}
Everything works actually fine but when I, for example, write to file structure {One,1,One} and open .txt file, where it is stored, I see this:
One ММММММММММММММММММММММММММММММММММММММММММММММММ One ММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММ
So I was asked why is it displayed so, what it depends on, but I could'nt give a good answer to that question
EDITED:
Added code which I use to write to file
void Write_to_File(vector<reader>& vec){
cin.clear(); // clearing
fflush(stdin);// input stream
const char* pointer = reinterpret_cast<const char*>(&vec[0]);
size_t bytes = vec.size() * sizeof(vec[0]);
fstream f("D:\\temp.txt", ios::out);
f.close();
ofstream file("D:\\temp.txt", ios::in | ios::binary);
file.write(pointer, bytes);
file.close();
remove("D:\\lab.txt");
rename("D:\\temp.txt", "D:\\lab.txt");
cout << "\n*** Successfully written data ***\n\n";
}
P.S. When I read from file everything is ok
You write 154 octets in a file, only One and One are char, so your text editor try to read char but get mostly garbage. You write binary, you should not expect to have something readable.
Why some data in binary file is shown as it is and other is shown in a strange way
It seems that you are trying to read the binary data as if it contained character encoded data. Some of it does - but not all. Perhaps this is why you think that it seems strange. Other than that, the output seems perfectly reasonable.
why is it displayed so
Because that is the textual representation of the data that the object contains in the character encoding that your reader uses.
what it depends on
It depends on the values that you have initialized the memory to have. For example the first character is displayed as O because you have initialized name[0] with the value 'O'. Some of the data is padding between members that can not be initialized directly. What the value of those bytes depends on is unspecified.

Read .part files and concatenate them all

So I am writing my own custom FTP client for a school project. I managed to get everything to work with the swarming FTP client and am down to one last small part...reading the .part files into the main file. I need to do two things. (1) Get this to read each file and write to the final file properly (2) The command to delete the part files after I am done with each one.
Can someone please help me to fix my concatenate function I wrote below? I thought I had it right to read each file until the EOF and then go on to the next.
In this case *numOfThreads is 17. Ended up with a file of 4742442 bytes instead of 594542592 bytes. Thanks and I am happy to provide any other useful information.
EDIT: Modified code for comment below.
std::string s = "Fedora-15-x86_64-Live-Desktop.iso";
std::ofstream out;
out.open(s.c_str(), std::ios::out);
for (int i = 0; i < 17; ++i)
{
std::ifstream in;
std::ostringstream convert;
convert << i;
std::string t = s + ".part" + convert.str();
in.open(t.c_str(), std::ios::in | std::ios::binary);
int size = 32*1024;
char *tempBuffer = new char[size];
if (in.good())
{
while (in.read(tempBuffer, size))
out.write(tempBuffer, in.gcount());
}
delete [] tempBuffer;
in.close();
}
out.close();
return 0;
Almost everything in your copying loop has problems.
while (!in.eof())
This is broken. Not much more to say than that.
bzero(tempBuffer, size);
This is fairly harmless, but utterly pointless.
in.read(tempBuffer, size);
This the "almost" part -- i.e., the one piece that isn't obviously broken.
out.write(tempBuffer, strlen(tempBuffer));
You don't want to use strlen to determine the length -- it's intended only for NUL-terminated (C-style) strings. If (as is apparently the case) the data you read may contain zero-bytes (rather than using zero-bytes only to signal the end of a string), this will simply produce the wrong size.
What you normally want to do is a loop something like:
while (read(some_amount) == succeeded)
write(amount that was read);
In C++ that will typically be something like:
while (infile.read(buffer, buffer_size))
outfile.write(buffer, infile.gcount());
It's probably also worth noting that since you're allocating memory for the buffer using new, but never using delete, your function is leaking memory. Probably better to do without new for this -- an array or vector would be obvious alternatives here.
Edit: as for why while (infile.read(...)) works, the read returns a reference to the stream. The stream in turn provides a conversion to bool (in C++11) or void * (in C++03) that can be interpreted as a Boolean. That conversion operator returns the state of the stream, so if reading failed, it will be interpreted as false, but as long as it succeeded, it will be interpreted as true.

C++ FILE to fstream?

I'm not an expert on this but I have this code:
FILE *OUTPUT_FILE;
OUTPUT_FILE = fopen(file, "a+");
fprintf(OUTPUT_FILE, "%s", &keys );
fclose(OUTPUT_FILE);
And I would like to pass it to a fstream syntax
like
ofstream fs;
????
They are included on this function:
int Store(int keys, char *file)
I know this is a C function but since I'm learning C++ I would like to know how do I translate this to a C++
sorry I don't know what else or if fs is compatible to fopen.
More information:
Thanks everybody but it seems its ignoring some values
int Store(int keys, char *file)
{
ofstream output_file("log.txt");
output_file << keys;
output_file.close();
cout << keys;
return 0;
}
when it oututs the file i just see a D i can see the hexadecimal values of the keys on the console but not being printed on the text....
First of all, ALL_CAPS should generally be reserved for macros -- using it for a normal variable holding a FILE * is generally a poor idea.
As far as the rest goes, it would look something like this:
std::fstream output_file(file, std::fstream::in | std::fstream::out | std::fstream::app);
output_file << keys;
That could be a bit wrong, though -- right now your function prototype says keys is an int, but you're passing it to fprintf using the %s format, which is for a string, not an int. As-is, the code produces undefined behavior, and it's not completely certain what you really want. I've taken a guess I think is reasonable, but I'm not quite sure.
Edit: In case you're trying to write out the raw bytes of keys, that would look something like:
output_file.write(reinterpret_cast<char *>(&keys), sizeof(keys));
Thanks for the suggestion #ildjarn.
http://www.cplusplus.com/reference/iostream/ostream/write/
It is important to note that you can use the same C style writing in C++. So all of your C code will work in C++! Which is often time the happier solution, especially for IO that doesn't need to be lightning fast.
To use ofstream:
std::ofstream foo; //Declaring the ofstream object
foo.open("file_name"); //Setting the output file name
foo<<keys; //Now it's ready to take << input!
foo.close(); //When you're done with the file

How to speed-up loading of 15M integers from file stream?

I have an array of precomputed integers, it's fixed size of 15M values. I need to load these values at the program start. Currently it takes up to 2 mins to load, file size is ~130MB. Is it any way to speed-up loading. I'm free to change save process as well.
std::array<int, 15000000> keys;
std::string config = "config.dat";
// how array is saved
std::ofstream out(config.c_str());
std::copy(keys.cbegin(), keys.cend(),
std::ostream_iterator<int>(out, "\n"));
// load of array
std::ifstream in(config.c_str());
std::copy(std::istream_iterator<int>(in),
std::istream_iterator<int>(), keys.begin());
in_ranks.close();
Thanks in advance.
SOLVED. Used the approach proposed in accepted answer. Now it takes just a blink.
Thanks all for your insights.
You have two issues regarding the speed of your write and read operations.
First, std::copy cannot do a block copy optimization when writing to an output_iterator because it doesn't have direct access to underlying target.
Second, you're writing the integers out as ascii and not binary, so for each iteration of your write output_iterator is creating an ascii representation of your int and on read it has to parse the text back into integers. I believe this is the brunt of your performance issue.
The raw storage of your array (assuming a 4 byte int) should only be 60MB, but since each character of an integer in ascii is 1 byte any ints with more than 4 characters are going to be larger than the binary storage, hence your 130MB file.
There is not an easy way to solve your speed problem portably (so that the file can be read on different endian or int sized machines) or when using std::copy. The easiest way is to just dump the whole of the array to disk and then read it all back using fstream.write and read, just remember that it's not strictly portable.
To write:
std::fstream out(config.c_str(), ios::out | ios::binary);
out.write( keys.data(), keys.size() * sizeof(int) );
And to read:
std::fstream in(config.c_str(), ios::in | ios::binary);
in.read( keys.data(), keys.size() * sizeof(int) );
----Update----
If you are really concerned about portability you could easily use a portable format (like your initial ascii version) in your distribution artifacts then when the program is first run it could convert that portable format to a locally optimized version for use during subsequent executions.
Something like this perhaps:
std::array<int, 15000000> keys;
// data.txt are the ascii values and data.bin is the binary version
if(!file_exists("data.bin")) {
std::ifstream in("data.txt");
std::copy(std::istream_iterator<int>(in),
std::istream_iterator<int>(), keys.begin());
in.close();
std::fstream out("data.bin", ios::out | ios::binary);
out.write( keys.data(), keys.size() * sizeof(int) );
} else {
std::fstream in("data.bin", ios::in | ios::binary);
in.read( keys.data(), keys.size() * sizeof(int) );
}
If you have an install process this preprocessing could also be done at that time...
Attention. Reality check ahead:
Reading integers from a large text file is an IO bound operation unless you're doing something completely wrong (like using C++ streams for this). Loading 15M integers from a text file takes less than 2 seconds on an AMD64#3GHZ when the file is already buffered (and only a bit long if had to be fetched from a sufficiently fast disk). Here's a quick & dirty routine to prove my point (that's why I do not check for all possible errors in the format of the integers, nor close my files at the end, because I exit() anyway).
$ wc nums.txt
15000000 15000000 156979060 nums.txt
$ head -n 5 nums.txt
730547560
-226810937
607950954
640895092
884005970
$ g++ -O2 read.cc
$ time ./a.out <nums.txt
=>1752547657
real 0m1.781s
user 0m1.651s
sys 0m0.114s
$ cat read.cc
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <vector>
int main()
{
char c;
int num=0;
int pos=1;
int line=1;
std::vector<int> res;
while(c=getchar(),c!=EOF)
{
if (c>='0' && c<='9')
num=num*10+c-'0';
else if (c=='-')
pos=0;
else if (c=='\n')
{
res.push_back(pos?num:-num);
num=0;
pos=1;
line++;
}
else
{
printf("I've got a problem with this file at line %d\n",line);
exit(1);
}
}
// make sure the optimizer does not throw vector away, also a check.
unsigned sum=0;
for (int i=0;i<res.size();i++)
{
sum=sum+(unsigned)res[i];
}
printf("=>%d\n",sum);
}
UPDATE: and here's my result when read the text file (not binary) using mmap:
$ g++ -O2 mread.cc
$ time ./a.out nums.txt
=>1752547657
real 0m0.559s
user 0m0.478s
sys 0m0.081s
code's on pastebin:
http://pastebin.com/NgqFa11k
What do I suggest
1-2 seconds is a realistic lower bound for a typical desktop machine for load this data. 2 minutes sounds more like a 60 Mhz micro controller reading from a cheap SD card. So either you have an undetected/unmentioned hardware condition or your implementation of C++ stream is somehow broken or unusable. I suggest to establish a lower bound for this task on your your machine by running my sample code.
if the integers are saved in binary format and you're not concerned with Endian problems, try reading the entire file into memory at once (fread) and cast the pointer to int *
You could precompile the array into a .o file, which wouldn't need to be recompiled unless the data changes.
thedata.hpp:
static const int NUM_ENTRIES = 5;
extern int thedata[NUM_ENTRIES];
thedata.cpp:
#include "thedata.hpp"
int thedata[NUM_ENTRIES] = {
10
,200
,3000
,40000
,500000
};
To compile this:
# make thedata.o
Then your main application would look something like:
#include "thedata.hpp"
using namespace std;
int main() {
for (int i=0; i<NUM_ENTRIES; i++) {
cout << thedata[i] << endl;
}
}
Assuming the data doesn't change often, and that you can process the data to create thedata.cpp, then this is effectively instant loadtime. I don't know if the compiler would choke on such a large literal array though!
Save the file in a binary format.
Write the file by taking a pointer to the start of your int array and convert it to a char pointer. Then write the 15000000*sizeof(int) chars to the file.
And when you read the file, do the same in reverse: read the file as a sequence of chars, take a pointer to the beginning of the sequence, and convert it to an int*.
of course, this assumes that endianness isn't an issue.
For actually reading and writing the file, memory mapping is probably the most sensible approach.
If the numbers never change, preprocess the file into a C++ source and compile it into the application.
If the number can change and thus you have to keep them in separate file that you have to load on startup then avoid doing that number by number using C++ IO streams. C++ IO streams are nice abstraction but there is too much of it for such simple task as loading a bunch of number fast. In my experience, huge part of the run time is spent in parsing the numbers and another in accessing the file char by char.
(Assuming your file is more than single long line.) Read the file line by line using std::getline(), parse numbers out of each line using not streams but std::strtol(). This avoids huge part of the overhead. You can get more speed out of the streams by crafting your own variant of std::getline(), such that reads the input ahead (using istream::read()); standard std::getline() also reads input char by char.
Use a buffer of 1000 (or even 15M, you can modify this size as you please) integers, not integer after integer. Not using a buffer is clearly the problem in my opinion.
If the data in the file is binary and you don't have to worry about endianess, and you're on a system that supports it, use the mmap system call. See this article on IBM's website:
High-performance network programming, Part 2: Speed up processing at both the client and server
Also see this SO post:
When should I use mmap for file access?

Fastest way to write large STL vector to file using STL

I have a large vector (10^9 elements) of chars, and I was wondering what is the fastest way to write such vector to a file. So far I've been using next code:
vector<char> vs;
// ... Fill vector with data
ofstream outfile("nanocube.txt", ios::out | ios::binary);
ostream_iterator<char> oi(outfile, '\0');
copy(vs.begin(), vs.end(), oi);
For this code it takes approximately two minutes to write all data to file. The actual question is: "Can I make it faster using STL and how"?
With such a large amount of data to be written (~1GB), you should write to the output stream directly, rather than using an output iterator. Since the data in a vector is stored contiguously, this will work and should be much faster.
ofstream outfile("nanocube.txt", ios::out | ios::binary);
outfile.write(&vs[0], vs.size());
There is a slight conceptual error with your second argument to ostream_iterator's constructor. It should be NULL pointer, if you don't want a delimiter (although, luckily for you, this will be treated as such implicitly), or the second argument should be omitted.
However, this means that after writing each character, the code needs to check for the pointer designating the delimiter (which might be somewhat inefficient).
I think, if you want to go with iterators, perhaps you could try ostreambuf_iterator.
Other options might include using the write() method (if it can handle output this large, or perhaps output it in chunks), and perhaps OS-specific output functions.
Since your data is contiguous in memory (as Charles said), you can use low level I/O. On Unix or Linux, you can do your write to a file descriptor. On Windows XP, use file handles. (It's a little trickier on XP, but well documented in MSDN.)
XP is a little funny about buffering. If you write a 1GB block to a handle, it will be slower than if you break the write up into smaller transfer sizes (in a loop). I've found the 256KB writes are most efficient. Once you've written the loop, you can play around with this and see what's the fastest transfer size.
OK, I did write method implementation with for loop that writes 256KB blocks (as Rob suggested) of data at each iteration and result is 16 seconds, so problem solved. This is my humble implementation so feel free to comment:
void writeCubeToFile(const vector<char> &vs)
{
const unsigned int blocksize = 262144;
unsigned long blocks = distance(vs.begin(), vs.end()) / blocksize;
ofstream outfile("nanocube.txt", ios::out | ios::binary);
for(unsigned long i = 0; i <= blocks; i++)
{
unsigned long position = blocksize * i;
if(blocksize > distance(vs.begin() + position, vs.end())) outfile.write(&*(vs.begin() + position), distance(vs.begin() + position, vs.end()));
else outfile.write(&*(vs.begin() + position), blocksize);
}
outfile.write("\0", 1);
outfile.close();
}
Thnx to all of you.
If you have other structure this method is still valid.
For example:
typedef std::pair<int,int> STL_Edge;
vector<STL_Edge> v;
void write_file(const char * path){
ofstream outfile(path, ios::out | ios::binary);
outfile.write((const char *)&v.front(), v.size()*sizeof(STL_Edge));
}
void read_file(const char * path,int reserveSpaceForEntries){
ifstream infile(path, ios::in | ios::binary);
v.resize(reserveSpaceForEntries);
infile.read((char *)&v.front(), v.size()*sizeof(STL_Edge));
}
Instead of writing via the file i/o methods, you could try to create a memory-mapped file, and then copy the vector to the memory-mapped file using memcpy.
Use the write method on it, it is in ram after all and you have contigous memory.. Fastest, while looking for flexibility later? Lose the built-in buffering, hint sequential i/o, lose the hidden things of iterator/utility, avoid streambuf when you can but do get dirty with boost::asio ..