ifstream vs. fread for binary files - c++

Which is faster? ifstream or fread.
Which should I use to read binary files?
fread() puts the whole file into the memory.
So after fread, accessing the buffer it creates is fast.
Does ifstream::open() puts the whole file into the memory?
or does it access the hard disk every time we run ifstream::read()?
So... does ifstream::open() == fread()?
or (ifstream::open(); ifstream::read(file_length);) == fread()?
Or shall I use ifstream::rdbuf()->read()?
edit:
My readFile() method now looks something like this:
void readFile()
{
std::ifstream fin;
fin.open("largefile.dat", ifstream::binary | ifstream::in);
// in each of these small read methods, there are at least 1 fin.read()
// call inside.
readHeaderInfo(fin);
readPreference(fin);
readMainContent(fin);
readVolumeData(fin);
readTextureData(fin);
fin.close();
}
Will the multiple fin.read() calls in the small methods slow down the program?
Shall I only use 1 fin.read() in the main method and pass the buffer into the small methods? I guess I am going to write a small program to test.
Thanks!

Are you really sure about fread putting the whole file into memory? File access can be buffered, but I doubt that you really get the whole file put into memory. I think ifstream::read just uses fread under the hood in a more C++ conformant way (and is therefore the standard way of reading binary information from a file in C++). I doubt that there is a significant performance difference.
To use fread, the file has to be open. It doesn't take just a file and put it into memory at once. so ifstream::open == fopen and ifstream::read == fread.

C++ stream api is usually a little bit slower then C file api if you use high level api, but it provides cleaner/safer api then C.
If you want speed, consider using memory mapped files, though there is no portable way of doing this with standard library.

As to which is faster, see my comment. For the rest:
Neither of these methods automatically reads the whole file into memory. They both read as much as you specify.
As least for ifstream I am sure that the IO is buffered, so there will not necessarily be a disk access for every read you make.
See this question for the C++-way of reading binary files.

The idea with C++ file streams is that some or all of the file is buffered in memory (based on what it thinks is optimal) and that you don't have to worry about it.
I would use ifstream::read() and just tell it how much you need.

Use stream operator:
DWORD processPid = 0;
std::ifstream myfile ("C:/Temp/myprocess.pid", std::ios::binary);
if (myfile.is_open())
{
myfile >> processPid;
myfile.close();
std::cout << "PID: " << processPid << std::endl;
}

Related

How to create a new binary file and fill it with a constant value?

I'm using cstdio functions to create an empty binary file and need it to be initialized to a specific byte value (can be zero, but not necessarily).
FILE* file = std::fopen("path/to/file", "wb+");
Is there a way to fill the whole file with a value, or is creating and filling a buffer and then using std::fwrite to continuously fill the file my only option? Something like
std::ffill(byteValue, sizeof(byteValue), fileSize, file);
It would be okay to have platform specific solutions (I'm targeting windows and linux).
Using C++ iostreams, it's pretty trivial:
std::ofstream out("path/to/file", std::ios::binary);
char byteValue = '\0'; // or whatever
std::fill_n(std::ostreambuf_iterator<char>(out), fileSize, byteValue);
If fileSize is really large, however, you may prefer to use std::ofstream::write instead--it can be substantially faster.
One option would be to mmap the file to memory and use memset on that memory. But filling a buffer and writing it multiple times to the file would be an easier solution and platform independent. The mmap and CreateFileMapping are platform specific.

Correct way of reading /proc/pid/status

I read /proc/<pid>/status this way:
std::ifstream file(filename);
std::string line;
int numberOfLinesToRead = 4;
int linesRead = 0;
while (std::getline(file, line)) {
// do stuff
if (numberOfLinesToRead == ++linesRead) {
break;
}
}
I noticed that in rare cases std::getline hangs.
Why it happens? I was under impression that proc filesystem should be in somewhat consistent state and there should not be cases when newline is missing. My assumption was that getline returns false when EOF/error occurs.
What is the recommended, safe way to read /proc/<pid>/status ?
Perhaps a more sure path is to use fread into a large buffer. The status file is small so allocate a local buffer and read the whole file.
Example look at the second answer for the simplest solution
This may still fail on the fopen or fread but a sensible error should be returned.
/proc is a virtual filesystem. That means reading from "files" in it is not the same as reading from normal filesystem.
If process exits the information about it is removed from /proc much faster than if it was real filesystem (dirty cache flushing delay involved here).
Bearing that in mind imagine that process exits before you get to read next line which wasn't buffered yet.
The solution is either to account for file loss since you may not need information about process which does not exists anymore or buffer the entire file and then only parse it.
EDIT: the hang in process should clearly be related to the fact that this is virtual filesystem. It does not behave exactly same way as real filesystem. Since this is specific fs type the issue could be in fs driver. The code you provide looks fine for normal file reading.

How to copy a file from one location to another in a fast way with C++ program? [duplicate]

This question already has answers here:
Copy a file in a sane, safe and efficient way
(9 answers)
Closed 7 years ago.
I am trying to understand the code behind the copy command which copies a file from one place to other.I studied c++ file system basics and have written the following code for my task.
#include<iostream>
#include<fstream>
using namespace std;
main()
{
cout<<"Copy file\n";
string from,to;
cout<<"Enter file address: ";
cin>>from;
ifstream in(from,ios::in | ios::binary);
if(!in)
{
cout<<"could not find file "<<from<<endl;
return 1;
}
cout<<"Enter file destination: ";
cin>>to;
ofstream out(to,ios::out | ios::binary);
char ch;
while(in.get(ch))
{
out.put(ch);
}
cout<<"file has been copied\n";
in.close();
out.close();
}
Though this code works but is much slower than the copy command of my OS which is windows.I want to know how I can make my program faster to reduce the difference between my program's time and the my OS's copy command time.
Reading one byte at time is going to waste a lot of time in function calls... use a bigger buffer:
char ch[4096];
while(in) {
in.read(ch, sizeof(ch));
out.write(ch, in.gcount());
}
(you may want to add some more error handling, e.g. out may go in a bad state and the like)
(the most C++-ish way is reported here, but takes advantage of streambuf functionalities that typically a beginner rarely has reason to know, and to me is also way less instructive)
You have correctly opened the file for binary read and binary write. However instead of reading characters(which is not meaningful in binary format), use istream::read and ostream::write.
Like other answers say, use bigger buffers. I'd go for 1MB.
But there's a lot more to it.
Also, avoid stream lib and FILE stuff. They buffer the data so you get 2 memcpy calls instead of 1.
Disabling buffering on the streams can achieve a similar result, but I think you're better of using the system calls directly.
And one last thing, on the "do it yourself" front. You must check the return values from read and write calls. They may read/write less bytes than you ask them to.
If you can manage a circular buffer, you should switch read/wrote whenever the function returns short... disk may be more ready for reading or for writing so no point in wasting time waiting instead of switching to the other thing you have to do.
And now the very last thing you might want to explore- look into the sendfile system call. It was built to speed up web servers by doing all the copy in the kernel and avoiding context switches and memcpys, but may serve here if it works with two disk file descriptors.

buffered std::ifstream to read from disk only once (C++)

Is there a way to add buffering to a std::ifstream in the sense that seeking (seekg) and reading multiple times wouldn't cause any more reads than necessary.
I'd basically like to read a chunk of file using stream multiple times but I'd want to have the chunk read from disk only once.
The question is probably a bit off cuz I want to mix buffered reads and streams ...
For example:
char filename[] = "C:\\test.txt";
fstream inputfile;
char buffer[20];
inputfile.open(filename, ios::binary);
inputfile.seekg(2, ios::beg);
inputfile.read(buffer, 3);
cout << buffer << std::endl;
inputfile.seekg(2, ios::beg);
inputfile.read(buffer, 3);
cout << buffer3 << std::endl;
I'd want to have to read from disk only once.
Personally, I wouldn't worry about reading from the file multiple times: the system will keep the used buffers hot anyway. However, depending on the location of the file and swap space, different disks may be used.
The file stream itself does support a setbuf() function which could theoretically set the internally used buffer to a size chosen by the user. However, the only arguments which have to be supported and need to have an effect are setbuf(0, 0) which is quite the opposite effect, i.e., the stream becomes unbuffered.
I guess, the easiest way to guarantee that the data isn't read from the stream again is to use a std::stringstream and use that instead of the file stream after initial reading, e.g.:
std::stringstream inputfile;
inputfile << std::ifstream(filename).rdbuf();
inputfile.seekg(0, std::ios_base::beg);
If it is undesirable to read the entire file stream first, a filtering stream could be used which reads the file whenever it reaches a section it hasn't read, yet. However, creating a corresponding stream buffer isn't that trivial and since I consider the original objective already questionable I would doubt that it has much of a benefit. Of course, you could create a simple stream which just does the initialization in the constructor and use that instead.

Writing huge txt files without overloading RAM

I need to write the results of a process in a txt file. The process is very long and the amount of data to be written is huge (~150Gb). The program works fine, but the problem is that the RAM gets overloaded and, at a certain point, it just stops.
The program is simple:
ostream f;
f.open(filePath);
for(int k=0; k<nDataset; k++){
//treat element of dataset
f << result;
}
f.close();
Is there a way of writing this file without overloading the memory?
You should flush output periodically.
For example:
if (k%10000 == 0) f.flush();
I'd like to suggest something like this
ogzstream f;
f.open(filePath);
string s("");
for(int k=0; k<nDataset; k++){
//treat element of dataset
s.append(result);
if (s.length() == OPTIMUM_BUFFER_SIZE) {
f << s;
f.flush();
s.clear();
}
}
f << s;
f.flush();
f.close();
Basically, you construct the stream in memory rather than redirecting to the stream so you don't have to worry about when the stream gets flushed. And when you are redirecting you ensure it's flushed to the actual file. Some ideas for the OPTIMUM_BUFFER_SIZE can be found from here and here.
I'm not exactly sure whether string or vector is the best option for the buffer. Will do some research myself and update the answer or you can refer to Effective STL by Scott Meyers.
If that truly is the code where your program gets stuck, then your explanation of the problem is wrong.
There's no text file. Your igzstream is not dealing with text, but a gzip archive.
There's no data being written. The code you show reads from the stream.
I don't know what your program does with result, because you didn't show that. But if it accumulates results into a collection in memory, that will grow. You'll need to find a way to process all your data without loading all of it into RAM at the same time.
Your memory usage could be from the decompressor. For some compression algorithms, an entire block has to be stored in memory. In such cases it's best to break the file into blocks and compress each separately (possibly pre-initializing a dictionary with the results of the previous block). I don't think that gzip is such an algorithm, however. You may need to find a library that supports streaming.