fclose() function slow - c++

I tried to create around 4 GB file using c++ fopen, fwrite and fflush and fclose functions on Linux machine, but I observed that fclose() function is taking very long time to close the file, taking around (40-50 seconds). I checked different forum to find the reason for this slowness, changed the code as suggested in forums, Used setvbuf() function to make unbuffered I/O as like write() function. but still could not resolve the issue.
totalBytes = 4294967296 // 4GB file
buffersize = 2000;
while ( size <= totalBytes )
{
len = fwrite(buffer, 1, bufferSize, fp);
if ( len != bufferSize ) {
cout<<"ERROR (Internal): in calling ACE_OS::fwrite() "<<endl;
ret = -1;
}
size = size + len;
}
...
...
...
fflush(fp);
flcose(fp);
Any solution to the above problem would be very helpful.
thanks,
Ramesh

The operating system is deferring actual writing to the disk and may not actually write the data to the disk at any writing operation or even at fflush().
I looked at the man page of fflush() and saw the following note:
Note that fflush() only flushes the user space buffers provided by
the C library. To ensure that the data is physically stored on disk
the kernel buffers must be flushed too, for example, with sync(2) or
fsync(2).
(there's a similar note for fclose() as well, although behaviour on your Linux system seems different)

It will take a long time to write that much data to the disk, and there's no way around that fact.

fopen/fwrite/fclose are C standard wrappers around the low level open/write/close. All fflush is doing is making sure all the 'write' calls have been made for something buffered. There is no "synchronization point" at the fflush. The operating system is flushing the write buffer before it allows 'close' to return.

Yeah, the time taken by fclose() is part of the time taken by the OS to write your data to the disk.
Look at fsync for achieving what you probably wanted with fflush. If you want to display some progress and the time used by fclose() is making it inaccurate, you could do a fsync() every 100 Mbytes written, or something like that.

Related

Does my C++ code handle 100GB+ file copying? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I need a cross-platform portable function that is able to copy a 100GB+ binary file to a new destination. My first solution was this:
void copy(const string &src, const string &dst)
{
FILE *f;
char *buf;
long len;
f = fopen(src.c_str(), "rb");
fseek(f, 0, SEEK_END);
len = ftell(f);
rewind(f);
buf = (char *) malloc((len+1) * sizeof(char));
fread(buf, len, 1, f);
fclose(f);
f = fopen(dst.c_str(), "a");
fwrite(buf, len, 1, f);
fclose(f);
}
Unfortunately, the program was very slow. I suspect the buffer had to keep 100GB+ in the memory. I'm tempted to try the new code (taken from Copy a file in a sane, safe and efficient way):
std::ifstream src_(src, std::ios::binary);
std::ofstream dst_ = std::ofstream(dst, std::ios::binary);
dst_ << src_.rdbuf();
src_.close();
dst_.close();
My question is about this line:
dst_ << src_.rdbuf();
What does the C++ standard say about it? Does the code compiled to byte-by-byte transfer or just whole-buffer transfer (like my first example)?
I'm curious does the << compiled to something useful for me? Maybe I don't have to invest my time on something else, and just let the compiler do the job inside the operator? If the operator translates to looping for me, why should I do it myself?
PS: std::filesystem::copy is impossible as the code has to work for C++11.
The crux of your question is what happens when you do this:
dst_ << src_.rdbuf();
Clearly this is two function calls: one to istream::rdbuf(), which simply returns a pointer to a streambuf, followed by one to ostream::operator<<(streambuf*), which is documented as follows:
After constructing and checking the sentry object, checks if sb is a null pointer. If it is, executes setstate(badbit) and exits. Otherwise, extracts characters from the input sequence controlled by sb and inserts them into *this until one of the following conditions are met: [...]
Reading this, the answer to your question is that copying a file in this way will not require buffering the entire file contents in memory--rather it will read a character at a time (perhaps with some chunked buffering, but that's an optimization that shouldn't change our analysis).
Here is one implementation: https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-api-4.6/a01075_source.html (__copy_streambufs). Essentially it a loop calling sgetc() and sputc() repeatedly until EOF is reached. The memory required is small and constant.
The C++ standard (I checked C++98, so this should be extremely compatible) says in [lib.ostream.inserters]:
basic_ostream<charT,traits>& operator<<
(basic_streambuf<charT,traits> *sb);
Effects: If sb is null calls setstate(badbit) (which may throw ios_base::failure).
Gets characters from sb and inserts them in *this. Characters are read from sb and inserted until any of the following occurs:
end-of-file occurs on the input sequence;
inserting in the output sequence fails (in which case the character to be inserted is not extracted);
an exception occurs while getting a character from sb.
If the function inserts no characters, it calls setstate(failbit) (which may throw ios_base::failure (27.4.4.3)). If an exception was thrown while extracting a character, the function set failbit in error state, and if failbit is on in exceptions() the caught exception is rethrown.
Returns: *this.
This description says << on rdbuf works on a character-by-character basis. In particular, if inserting of a character fails, that exact character remains unread in the input sequence. This implies that an implementation cannot just extract the whole contents into a single huge buffer upfront.
So yes, there's a loop somewhere in the internals of the standard library that does a byte-by-byte (well, charT really) transfer.
However, this does not mean that the whole thing is completely unbuffered. This is simply about what operator<< does internally. Your ostream object will still accumulate data internally until its buffer is full, then call write (or whatever low-level function your OS uses).
Unfortunately, the program was very slow.
Your first solution is wrong for a very simple reason: it reads the entire source file in memory, then write it entirely.
Files have been invented (perhaps in the 1960s) to handle data that don't fit in memory (and has to be in some "slower" storage, at that time hard disks or drums, or perhaps even tapes). And they have always been copied by "chunks".
The current (Unix-like) definition of file (as a sequence of bytes than is open-ed, read, write-n, close-d) is more recent than 1960s. Probably the late 1970s or early 1980s. And it comes with the notion of streams (which has been standardized in C with <stdio.h> and in C++ with std::fstream).
So your program has to work (like every file copying program today) for files much bigger than the available memory.You need some loop to read some buffer, write it, and repeat.
The size of the buffer is very important. If it is too small, you'll make too many IO operations (e.g. system calls). If it is too big, IO might be inefficient or even not work.
In practice, the buffer should today be much less than your RAM, typically several megabytes.
Your code is more C like than C++ like because it uses fopen. Here is a possible solution in C with <stdio.h>. If you code in genuine C++, adapt it to <fstream>:
void copyfile(const char*destpath, const char*srcpath) {
// experiment with various buffer size
#define MYBUFFERSIZE (4*1024*1024) /* four megabytes */
char* buf = malloc(MYBUFFERSIZE);
if (!buf) { perror("malloc buf"); exit(EXIT_FAILURE); };
FILE* filsrc = fopen(srcpath, "r");
if (!filsrc) { perror(srcpath); exit(EXIT_FAILURE); };
FILE* fildest = fopen(destpath, "w");
if (!fildest) { perror(destpath); exit(EXIT_FAILURE); };
for (;;) {
size_t rdsiz = fread(buf, 1, MYBUFFERSIZE, filsrc);
if (rdsiz==0) // end of file
break;
else if (rdsiz<0) // input error
{ perror("fread"); exit(EXIT_FAILURE); };
size_t wrsiz = fwrite(buf, rdsiz, 1, fildest);
if (wrsiz != 1) { perror("fwrite"); exit(EXIT_FAILURE); };
}
if (fclose(filsrc)) { perror("fclose source"); exit(EXIT_FAILURE); };
if (fclose(fildest)) { perror("fclose dest"); exit(EXIT_FAILURE); };
}
For simplicity, I am reading the buffer in byte components and writing it as a whole. A better solution is to handle partial writes.
Apparently dst_ << src_.rdbuf(); might do some loop internally (I have to admit I never used it and did not understand that at first; thanks to Melpopene for correcting me). But the actual buffer size matters a big lot. The two other answers (by John Swinck and by melpomene) focus on that rdbuf() thing. My answer focus on explaining why copying can be slow when you do it like in your first solution, and why you need to loop and why the buffer size matters a big lot.
If you really care about performance, you need to understand implementation details and operating system specific things. So read Operating systems: three easy pieces. Then understand how, on your particular operating system, the various buffering is done (there are several layers of buffers involved: your program buffers, the standard stream buffers, the kernel buffers, the page cache). Don't expect your C++ standard library to buffer in an optimal fashion.
Don't even dream of coding in standard C++ (without operating system specific stuff) an optimal or very fast copying function. If performance matters, you need to dive in OS specific details.
On Linux, you might use time(1), oprofile(1), perf(1) to measure your program's performance. You could use strace(1) to understand the various system calls involved (see syscalls(2) for a list). You might even code (in a Linux specific way) using directly the open(2), read(2), write(2), close(2) and perhaps readahead(2), mmap(2), posix_fadvise(2), madvise(2), sendfile(2) system calls.
At last, large file copying are limited by disk IO (which is the bottleneck). So even by spending days in optimizing OS specific code, you won't win much. The hardware is the limitation. You probably should code what is the most readable code for you (it might be that dst_ << src_.rdbuf(); thing which is looping) or use some library providing file copy. You might win a tiny amount of performance by tuning the various buffer sizes.
If the operator translates to looping for me, why should I do it myself?
Because you have no explicit guarantee on the actual buffering done (at various levels). As I explained, buffering matters for performance. Perhaps the actual performance is not that critical for you, and the ordinary settings of your system and standard library (and their default buffers sizes) might be enough.
PS. Your question contains at least 3 different questions (but related ones). I don't find it clear (so downvoted it), because I did not understand what is the most relevant one. Is it : performance? robustness? meaning of dst_ << src_.rdbuf();? Why is the first solution slow? How to copy large files quickly?

Correct way of reading /proc/pid/status

I read /proc/<pid>/status this way:
std::ifstream file(filename);
std::string line;
int numberOfLinesToRead = 4;
int linesRead = 0;
while (std::getline(file, line)) {
// do stuff
if (numberOfLinesToRead == ++linesRead) {
break;
}
}
I noticed that in rare cases std::getline hangs.
Why it happens? I was under impression that proc filesystem should be in somewhat consistent state and there should not be cases when newline is missing. My assumption was that getline returns false when EOF/error occurs.
What is the recommended, safe way to read /proc/<pid>/status ?
Perhaps a more sure path is to use fread into a large buffer. The status file is small so allocate a local buffer and read the whole file.
Example look at the second answer for the simplest solution
This may still fail on the fopen or fread but a sensible error should be returned.
/proc is a virtual filesystem. That means reading from "files" in it is not the same as reading from normal filesystem.
If process exits the information about it is removed from /proc much faster than if it was real filesystem (dirty cache flushing delay involved here).
Bearing that in mind imagine that process exits before you get to read next line which wasn't buffered yet.
The solution is either to account for file loss since you may not need information about process which does not exists anymore or buffer the entire file and then only parse it.
EDIT: the hang in process should clearly be related to the fact that this is virtual filesystem. It does not behave exactly same way as real filesystem. Since this is specific fs type the issue could be in fs driver. The code you provide looks fine for normal file reading.

Limit CPU usage of fwrite operations

I'm developing a program with several threads that manages the streaming from several cameras. I have to write every raw images on SSD disk. I'm using fwrite to put the image in a binary file. Something like:
FILE* output;
output = fopen(fileName, "wb");
fwrite(imageData, imageSize, 1, output);
fclose(output);
The procedure seems to runs fast enough to save all images with the given cameras throughput. The problem is that the save procedure is CPU consuming, and I start to have sync issues when the save process is enabled, due to the CPU usage of the save threads.
Is there any way to reduce the CPU load of fwrite operations? Like playing with buffering, better DMA settings, ...?
Thanks!
MIX
-- UPDATE 1
Forgetting the multithreading software, here is a simple file writer software:
#include <stdio.h>
#include <stdlib.h>
const unsigned int TOT_DATA = 1280*2*960;
int main(int argc, char* argv[])
{
if(argc != 2)
{
printf("Usage:\n");
printf(" %s totWrite\n\n", argv[0]);
return -1;
}
char* imageData;
FILE* output;
char fileName[256];
unsigned int totWrite;
totWrite = atoi(argv[1]);
imageData = new char[TOT_DATA];
printf("Write imageData[%u] on file %u times.\n", TOT_DATA, totWrite);
for(unsigned int i = 0; i < totWrite; i++)
{
sprintf(fileName, "image_%06u.raw", i);
output = fopen(fileName, "wb");
fwrite(imageData, TOT_DATA, 1, output);
fclose(output);
}
printf("DONE!\n");
delete [] imageData;
return 0;
}
A char buffer will be created, and it will be written on file totWrite times. No overwrites, since each cycle writes on a new file. (of course, one have to remove files written by previous run...)
Running top (I'm on Linux) while the program is running I see that ~50% of the CPU (that means 50% of one of the 4 cores) is used. I suppose fwrite is the bottleneck regarding the CPU usage since it is the "slower" operation in the cycle, so the one "more probably" running when top update its stats. Even "more probable" if TOT_DATA will be increased, say, 100 times.
Any further consideration about what can reduce CPU usage in such program?
If you consider playing with DMA settings you're way out of the scope of the standard C library. It will be nowhere near portable - and then you don't have any benefits of using portable functions.
The first step you probably should use (after you've confirmed that it's CPU that is the bottleneck) is to use lower level functions like for example open/write (or whatever your OS calls them).
Basically what can happen with fwrite is that the program first copies the data to another place in memory (the FILE* buffer) before actually writing the data to disc. This operation certainly is CPU bound and if data transfer by the CPU is slower than the data transfer to the SSD it could be a case where CPU power is consumed for no good reason.
Also one should note that using multiple threads have it's drawbacks. First if it were not an SSD multiple threads writing to disk could result in redundant head movements which is not bad, but even SSD may suffer somewhat as you might fragment the layout of the data.
There's also a problem in loading the entire file as you seem to do in the example, especially if you do it in multiple threads. It will simply consume a lot of memory (which could result in that swapping is required). If possible you should write the data to the file as the data arrives.

How to change buffer size with boost::iostreams?

My program reads dozens of very large files in parallel, just one line at a time. It seems like the major performance bottleneck is HDD seek time from file to file (though I'm not completely sure how to verify this), so I think it would be faster if I could buffer the input.
I'm using C++ code like this to read my files through boost::iostreams "filtering streams":
input = new filtering_istream;
input->push(gzip_decompressor());
file_source in (fname);
input->push(in);
According to the documentation, file_source does not have any way to set the buffer size but filtering_stream::push seems to:
void push( const T& t,
std::streamsize buffer_size,
std::streamsize pback_size );
So I tried input->push(in, 1E9) and indeed my program's memory usage shot up, but the speed didn't change at all.
Was I simply wrong that read buffering would improve performance? Or did I do this wrong? Can I buffer a file_source directly, or do I need to create a filtering_streambuf? If the latter, how does that work? The documentation isn't exactly full of examples.
You should profile it too see where the bottleneck is.
Perhaps it's in the kernel, perhaps your at your hardware's limit. Until you profile it to find out you're stumbling in the dark.
EDIT:
Ok, a more thorough answer this time, then. According to the Boost.Iostreams documentation basic_file_source is just a wrapper around std::filebuf, which in turn is built on std::streambuf. To quote the documentation:
CopyConstructible and Assignable wrapper for a std::basic_filebuf opened in read-only mode.
streambuf does provide a method pubsetbuf (not the best reference perhaps, but the first google turned up) which you can, apparently, use to control the buffer size.
For example:
#include <fstream>
int main()
{
char buf[4096];
std::ifstream f;
f.rdbuf()->pubsetbuf(buf, 4096);
f.open("/tmp/large_file", std::ios::binary);
while( !f.eof() )
{
char rbuf[1024];
f.read(rbuf, 1024);
}
return 0;
}
In my test (optimizations off, though) I actually got worse performance with a 4096 bytes buffer than a 16 bytes buffer but YMMV -- a good example of why you should always profile first :)
But, as you say, the basic_file_sink does not provide any means to access this as it hides the underlying filebuf in its private part.
If you think this is wrong you could:
Urge the Boost developers to expose such functionality, use the mailing list or the trac.
Build your own filebuf wrapper which does expose the buffer size. There's a section in the tutorial which explains writing custom sources that might be a good starting point.
Write a custom source based on whatever, that does all the caching you fancy.
Remember that your hard drive as well as the kernel already does caching and buffering on file reads, which I don't think that you'll get much of a performance increase from caching even more.
And in closing, a word on profiling. There's a ton of powerful profiling tools available for Linux an I don't even know half of them by name, but for example there's iotop which is kind of neat because it's super simple to use. It's pretty much like top but instead shows disk related metrics. For example:
Total DISK READ: 31.23 M/s | Total DISK WRITE: 109.36 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
19502 be/4 staffan 31.23 M/s 0.00 B/s 0.00 % 91.93 % ./apa
tells me that my progam spends over 90% of its time waiting for IO, i.e. it's IO bound. If you need something more powerful I'm sure google can help you.
And remember that benchmarking on a hot or cold cache greatly affects the outcome.

fread speeds managed unmanaged

Ok, so I'm reading a binary file into a char array I've allocated with malloc.
(btw the code here isn't the actual code, I just wrote it on the spot to demonstrate, so any mistakes here are probably not mistakes in the actual program.) This method reads at about 50million bytes per second.
main
char *buffer = (char*)malloc(file_length_in_bytes*sizeof(char));
memset(buffer,0,file_length_in_bytes*sizeof(char));
//start time here
read_whole_file(buffer);
//end time here
free(buffer);
read_whole_buffer
void read_whole_buffer(char* buffer)
{
//file already opened
fseek(_file_pointer, 0, SEEK_SET);
int a = sizeof(buffer[0]);
fread(buffer, a, file_length_in_bytes*a, _file_pointer);
}
I've written something similar with managed c++ that uses filestream I believe and the function ReadByte() to read the entire file, byte by byte, and it reads at around 50million bytes per second.
Also, I have a sata and an IDE drive in my computer, and I've loading the file off of both, doesn't make any difference at all(Which is weird because I was under the assumption that SATA read much faster than IDE.)
Question
Maybe you can all understand why this doesn't make any sense to me. As far as I knew, it should be much faster to fread a whole file into an array, as opposed to reading it byte by byte. On top of that, through testing I've discovered that managed c++ is slower (only noticeable though if you are benchmarking your code and you require speed.)
SO
Why in the world am I reading at the same speed with both applications. Also is 50 million bytes from a file, into an array quick?
Maybe I my motherboard is bottle necking me? That just doesn't seem to make much sense eather.
Is there maybe a faster way to read a file into an array?
thanks.
My 'script timer'
Records start and end time with millisecond resolution...Most importantly it's not a timer
#pragma once
#ifndef __Script_Timer__
#define __Script_Timer__
#include <sys/timeb.h>
extern "C"
{
struct Script_Timer
{
unsigned long milliseconds;
unsigned long seconds;
struct timeb start_t;
struct timeb end_t;
};
void End_ST(Script_Timer *This)
{
ftime(&This->end_t);
This->seconds = This->end_t.time - This->start_t.time;
This->milliseconds = (This->seconds * 1000) + (This->end_t.millitm - This->start_t.millitm);
}
void Start_ST(Script_Timer *This)
{
ftime(&This->start_t);
}
}
#endif
Read buffer thing
char face = 0;
char comp = 0;
char nutz = 0;
for(int i=0;i<(_length*sizeof(char));++i)
{
face = buffer[i];
if(face == comp)
nutz = (face + comp)/i;
comp++;
}
Transfers from or to main memory run at speeds of gigabytes per second. Inside the CPU data flows even faster. It is not surprising that, whatever you do at the software side, the hard drive itself remains the bottleneck.
Here are some numbers from my system, using PerformanceTest 7.0:
hard disk: Samsung HD103SI 5400 rpm: sequential read/write at 80 MB/s
memory: 3 * 2 GB at 400 MHz DDR3: read/write around 2.2 GB/s
So if your system is a bit older than mine, a hard drive speed of 50 MB/s is not surprising. The connection to the drive (IDE/SATA) is not all that relevant; it's mainly about the number of bits passing the drive heads per second, purely a hardware thing.
Another thing to keep in mind is your OS's filesystem cache. It could be that the second time round, the hard drive isn't accessed at all.
The 180 MB/s memory read speed that you mention in your comment does seem a bit on the low side, but that may well depend on the exact code. Your CPU's caches come into play here. Maybe you could post the code you used to measure this?
The FILE* API uses buffered streams, so even if you read byte by byte, the API internally reads buffer by buffer. So your comparison will not make a big difference.
The low level IO API (open, read, write, close) is unbuffered, so using this one will make a difference.
It may also be faster for you, if you do not need the automatic buffering of the FILE* API!
I've done some tests on this, and after a certain point, the effect of increased buffer size goes down the bigger the buffer. There is usually an optimum buffer size you can find with a bit of trial and error.
Note also that fread() (or more specifically the C or C++ I/O library) will probably be doing its own buffering. If your system suports it a plain read() may (or may not) be a bit faster.