I prefer not to use XML library parser out there, so can you give me suggestion which good write function to use to write data to XML file? I will make alot of to calls to the write function so the write function should be able to keep track of the last write position and it should not take too much resource. I have two different write below but I can't keep track the last write position unless I have to read the file until end of file.
case#1
FILE *pfile = _tfopen(GetFileNameXML(), _T("w"));
if(pfile)
{
_fputts(TEXT(""), pfile);
}
if(pfile)
{
fclose(pfile);
pfile = NULL;
}
case#2
HANDLE hFile = CreateFile(GetFileNameXML(), GENERIC_READ|GENERIC_WRITE,
FILE_SHARE_WRITE|FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if(hFile != INVALID_HANDLE_VALUE)
{
WriteFile(hFile,,,,,);
}
CloseHandle(hFile);
thanks.
If all you need is to write some text files, use C++'s standard library file facilities. The samples here will be helpful: http://www.cplusplus.com/doc/tutorial/files/
First, what's your aversion to using a standard XML processing library?
Next, if you decide to roll your own, definitely don't go directly at the Win32 APIs - at least not unless you're going to write out the generated XML in large chunks, or you're going to implement your own buffering layer.
It's not going to matter for dealing with tiny files, but you specifically mention good performance and many calls to the write function. WriteFile has a fair amount of overhead, it does a lot of work and involves user->kernel->user mode switches, which are expensive. If you're dealing with "normally sized" XML files you probably won't be able to see much of a difference, but if you're generating monstrously sized dumps it's definitely something to keep in mind.
You mention tracking the last write position - first off, it should be easy... with FILE buffers you have ftell, with raw Win32 API you have SetFilePointerEx - call it with liDistanceToMove=0 and dwMoveMethod=FILE_CURRENT, and you get the current file position after a write. But why do you need this? If you're streaming out an XML file, you should generally keep on streaming until you're done writing - are you closing and re-opening the file? Or are you writing a valid XML file which you want to insert more data into later?
As for the overhead of the Win32 file functions, it may or may not be relevant in your case (depending on the size of the files you're dealing with), but with larger files it matters a lot - included below is a micro-benchmark that simpy reads a file to memory with ReadFile, letting you specify different buffer sizes from the command line. It's interesting to look at, say, Process Explorer's IO tab while running the tool. Here's some statistics from my measly laptop (Win7-SP1 x64, core2duo P7350#2.0GHz, 4GB ram, 120GB Intel-320 SSD).
Take it for what it is, a micro-benchmark. The performance might or might not matter in your particular situation, but I do believe the numbers demonstrate that there's considerable overhead to the Win32 file APIs, and that doing a little buffering of your own helps.
With a fully cached 2GB file:
BlkSz Speed
32 14.4MB/s
64 28.6MB/s
128 56MB/s
256 107MB/s
512 205MB/s
1024 350MB/s
4096 800MB/s
32768 ~2GB/s
With a "so big there will only be cache misses" 4GB file:
BlkSz Speed CPU
32 13MB/s 49%
64 26MB/s 49%
128 52MB/s 49%
256 99MB/s 49%
512 180MB/s 49%
1024 200MB/s 32%
4096 185MB/s 22%
32768 205MB/s 13%
Keep in mind that 49% CPU usage means that one CPU core is pretty much fully pegged - a single thread can't really push the machine much harder. Notice the pathological behavior of the 4kb buffer in the second table - it was reproducible, and I don't have an explanation for it.
Crappy micro-benchmark code goes here:
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <vector>
#include <iostream>
#include <string>
#include <assert.h>
unsigned getDuration(FILETIME& timeStart, FILETIME& timeEnd)
{
// duration is in 100-nanoseconds, we want milliseconds
// 1 millisecond = 1000 microseconds = 1000000 nanoseconds
LARGE_INTEGER ts, te, res;
ts.HighPart = timeStart.dwHighDateTime; ts.LowPart = timeStart.dwLowDateTime;
te.HighPart = timeEnd.dwHighDateTime; te.LowPart = timeEnd.dwLowDateTime;
res.QuadPart = ((te.QuadPart - ts.QuadPart) / 10000);
assert(res.QuadPart < UINT_MAX);
return res.QuadPart;
}
int main(int argc, char* argv[])
{
if(argc < 2) {
puts("Syntax: ReadFile [filename] [blocksize]");
return 0;
}
char *filename= argv[1];
int blockSize = atoi(argv[2]);
if(blockSize < 1) {
puts("Please specify a blocksize larger than 0");
return 1;
}
HANDLE hFile = CreateFile(filename, GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, 0);
if(INVALID_HANDLE_VALUE == hFile) {
puts("error opening input file");
return 1;
}
std::vector<char> buffer(blockSize);
LARGE_INTEGER fileSize;
if(!GetFileSizeEx(hFile, &fileSize)) {
puts("Failed getting file size.");
return 1;
}
std::cout << "File size " << fileSize.QuadPart << ", that's " << (fileSize.QuadPart / blockSize) <<
" blocks of " << blockSize << " bytes - reading..." << std::endl;
FILETIME dummy, kernelStart, userStart;
GetProcessTimes(GetCurrentProcess(), &dummy, &dummy, &kernelStart, &userStart);
DWORD ticks = GetTickCount();
DWORD bytesRead = 0;
do {
if(!ReadFile(hFile, &buffer[0], blockSize, &bytesRead, 0)) {
puts("Error calling ReadFile");
return 1;
}
} while(bytesRead == blockSize);
ticks = GetTickCount() - ticks;
FILETIME kernelEnd, userEnd;
GetProcessTimes(GetCurrentProcess(), &dummy, &dummy, &kernelEnd, &userEnd);
CloseHandle(hFile);
std::cout << "Reading with " << blockSize << " sized blocks took " << ticks << "ms, spending " <<
getDuration(kernelStart, kernelEnd) << "ms in kernel and " <<
getDuration(userStart, userEnd) << "ms in user mode. Hit enter to countinue." << std::endl;
std::string dummyString;
std::cin >> dummyString;
return 0;
}
Related
I have a big .txt file (over 1gb). While searching a way to open it fast I found mapping.
I managed to use CreateFile(), then I made a char buffer[] and finally put the file contents in the buffer with ReadFile(). The problem is that the file is too big, so I can't load it all at once into the buffer, because I can't make an array that big.
I think the solution would be to open and close the file at specified locations in the .txt file and get a few of the file contents each time. The only source I found explaining mapping was on MSDN but I can't find out how to do it.
So in the end, how do I read a big file with a mapping?
HANDLE my_File = CreateFileA("words.txt", GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (my_File == INVALID_HANDLE_VALUE)
{
cout << "Failed to open file" << endl;
return 0;
}
constexpr size_t BUFFSIZE = 1000000;
char buffer[BUFFSIZE];
DWORD dwBytesToRead = BUFFSIZE - 1;
DWORD dwBytesRead = 0;
BOOL my_Bool = ReadFile(my_File,(void*)buffer, dwBytesToRead, &dwBytesRead, NULL);
if (dwBytesRead > 0)
{
buffer[dwBytesRead] = '\0';
cout << "FILE IS: " << buffer << endl;
}
CloseHandle(my_File);
I think you are confused. The whole purpose of mapping part or all of a file into memory is to avoid the need to buffer the data yourself. Instead, the OS takes care of that for you, allowing you to access the contents of the file via a pointer, just like you would any other in-memory data structure.
Only you can decide if that's the best solution for you. In a 32 bit app, 1GB is a lot of addressing space to find. In a 64 bit app there is no such problem. As mentioned in the comments, reading the file in chunks into a smaller buffer can be a better bet, especially if you want to process it sequentially.
For some example code on how to memory map a file, see:
How to CreateFileMapping in C++?
I have a C++ program that uses the POSIX API to write a file opened with O_DIRECT. Concurrently, another thread is reading back from the same file via a different file descriptor. I've noticed that occasionally the data read back from the file contains all zeroes, rather than the actual data I wrote. Why is this?
Here's an MCVE in C++17. Compile with g++ -std=c++17 -Wall -otest test.cpp or equivalent. Sorry I couldn't seem to make it any shorter. All it does is write 100 MiB of constant bytes (0x5A) to a file in one thread and read them back in another, printing a message if any of the read-back bytes are not equal to 0x5A.
WARNING, this MCVE will delete and rewrite any file in the current working directory named foo.
#include <algorithm>
#include <cstddef>
#include <cstdint>
#include <cstdlib>
#include <iostream>
#include <thread>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
constexpr size_t CHUNK_SIZE = 1024 * 1024;
constexpr size_t TOTAL_SIZE = 100 * CHUNK_SIZE;
int main(int argc, char *argv[])
{
::unlink("foo");
std::thread write_thread([]()
{
int fd = ::open("foo", O_WRONLY | O_CREAT | O_DIRECT, 0777);
if (fd < 0) std::exit(-1);
uint8_t *buffer = static_cast<uint8_t *>(
std::aligned_alloc(4096, CHUNK_SIZE));
std::fill(buffer, buffer + CHUNK_SIZE, 0x5A);
size_t written = 0;
while (written < TOTAL_SIZE)
{
ssize_t rv = ::write(fd, buffer,
std::min(TOTAL_SIZE - written, CHUNK_SIZE));
if (rv < 0) { std::cerr << "write error" << std::endl; std::exit(-1); }
written += rv;
}
});
std::thread read_thread([]()
{
int fd = ::open("foo", O_RDONLY, 0);
if (fd < 0) std::exit(-1);
uint8_t *buffer = new uint8_t[CHUNK_SIZE];
size_t checked = 0;
while (checked < TOTAL_SIZE)
{
ssize_t rv = ::read(fd, buffer, CHUNK_SIZE);
if (rv < 0) { std::cerr << "write error" << std::endl; std::exit(-1); }
for (ssize_t i = 0; i < rv; ++i)
if (buffer[i] != 0x5A)
std::cerr << "readback mismatch at offset " << checked + i << std::endl;
checked += rv;
}
});
write_thread.join();
read_thread.join();
}
(Details such as proper error checking and resource management are omitted here for the sake of the MCVE. This is not my actual program but it shows the same behavior.)
I'm testing on Linux 4.15.0 with an SSD. About 1/3 of the time I run the program, the "readback mismatch" message prints. Sometimes it doesn't. In all cases, if I examine foo after the fact I find that it does contain the correct data.
If you remove O_DIRECT from the ::open() flags in the write thread, the problem goes away and the "readback mismatch" message never prints.
I could understand why my ::read() might return 0 or something to indicate I've already read everything that has been flushed to disk yet. But I can't understand why it would perform what appears to be a successful read, but with data other than what I wrote. Clearly I'm missing something, but what is it?
So, O_DIRECT has some additional constraints that might not make it what you're looking for:
Applications should avoid mixing O_DIRECT and normal I/O to the same
file, and especially to overlapping byte regions in the same file.
Even when the filesystem correctly handles the coherency issues in
this situation, overall I/O throughput is likely to be slower than
using either mode alone.
Instead, I think O_SYNC might be better, since it does provide the expected guarantees:
O_SYNC provides synchronized I/O file integrity completion, meaning
write operations will flush data and all associated metadata to the
underlying hardware. O_DSYNC provides synchronized I/O data
integrity completion, meaning write operations will flush data to the
underlying hardware, but will only flush metadata updates that are
required to allow a subsequent read operation to complete
successfully. Data integrity completion can reduce the number of
disk operations that are required for applications that don't need
the guarantees of file integrity completion.
I am writing a program to check whether a file is PE file or not. For that, I need to read only the file headers of files(which I guess do not occupy more than first 1024 bytes of a file).
I tried using creatfile() + readfile() combination which turns out be slower because I am iterating through all the files in system drive. It is taking 15-20 minutes just to iterate through them.
Can you please tell some alternate approach to open and read the files to make it faster?
Note : Please note that I do NOT need to read the file in whole. I just need to read the initial part of the file -- DOS header, PE header etc which I guess do not occupy more than first 512 bytes of the file.
Here is my code :
bool IsPEFile(const String filePath)
{
HANDLE hFile = CreateFile(filePath.c_str(),
GENERIC_READ,
FILE_SHARE_READ | FILE_SHARE_WRITE,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL);
DWORD dwBytesRead = 0;
const DWORD CHUNK_SIZE = 2048;
BYTE szBuffer[CHUNK_SIZE] = {0};
LONGLONG size;
LARGE_INTEGER li = {0};
if (hFile != INVALID_HANDLE_VALUE)
{
if(GetFileSizeEx(hFile, &li) && li.QuadPart > 0)
{
size = li.QuadPart;
ReadFile(hFile, szBuffer, CHUNK_SIZE, &dwBytesRead, NULL);
if(dwBytesRead > 0 && (WORDPTR(szBuffer[0]) == ('M' << 8) + 'Z' || WORDPTR(szBuffer[0]) == ('Z' << 8) + 'M'))
{
LONGLONG ne_pe_header = DWORDPTR(szBuffer[0x3c]);
WORD signature = 0;
if(ne_pe_header <= dwBytesRead-2)
{
signature = WORDPTR(szBuffer[ne_pe_header]);
}
else if (ne_pe_header < size )
{
SetFilePointer(hFile, ne_pe_header, NULL, FILE_BEGIN);
ReadFile(hFile, &signature, sizeof(signature), &dwBytesRead, NULL);
if (dwBytesRead != sizeof(signature))
{
return false;
}
}
if(signature == 0x4550) // PE file
{
return true;
}
}
}
CloseHandle(hFile);
}
return false;
}
Thanks in advance.
I think you're hitting the inherent limitations of mechanical hard disk drives. You didn't mention whether you're using a HDD or a solid-state disk, but I assume a HDD given that your file accesses are slow.
HDDs can read data at about 100 MB/s sequentially, but seek time is a bit over 10 ms. This means that if you seek to a certain location (10 ms), you might as well read a megabyte of data (another 10 ms). This also means that you can access only less than 100 files per second.
So, in your case it doesn't matter much whether you're reading the first 512 bytes of a file or the first hundred kilobytes of a file.
Hardware is cheap, programmer time is expensive. Your best bet is to purchase a solid-state disk drive if your file accesses are too slow. I predict that eventually all computers will have solid-state disk drives.
Note: if the bottleneck is the HDD, there is nothing you can do about it other than to replace the HDD with better technology. Practically all file access mechanisms are equally slow. The only thing you can do about it is to read only the initial part of a file if the file is really really large such as multiple megabytes. But based on your code example you're already doing that.
For faster file IO, you need to use CreateFile and ReadFile APIs of Win32.
If you want to speed up, you can use file buffering and make file non-blocking by using overlapped IO or IOCP.
See this example for help: https://msdn.microsoft.com/en-us/library/windows/desktop/bb540534%28v=vs.85%29.aspx
And I think that FILE and fstream of C and C++ respectively are not faster than Win32.
in my program I want to read several text files (more than ~800 files), each with 256 lines and their filenames starting from 1.txt to n.txt, and store them into a database after several processing steps. My problem is the data's reading speed. I could speed the program up to about twice the speed it had before by using OpenMP multithreading for the reading loop. Is there a way to speed it up a bit more? My actual code is
std::string CCD_Folder = CCDFolder; //CCDFolder is a pointer to a char array
int b = 0;
int PosCounter = 0;
int WAVENUMBER, WAVELUT;
std::vector<std::string> tempstr;
std::string inputline;
//Input
omp_set_num_threads(YValue);
#pragma omp parallel for private(WAVENUMBER) private(WAVELUT) private(PosCounter) private(tempstr) private(inputline)
for(int i = 1; i < (CCD_Filenumbers+1); i++)
{
//std::cout << omp_get_thread_num() << ' ' << i << '\n';
//Umwandlung und Erstellung des Dateinamens, Öffnen des Lesekanals
std::string CCD_Filenumber = boost::lexical_cast<string>(i);
std::string CCD_Filename = CCD_Folder + '\\' + CCD_Filenumber + ".txt";
std::ifstream datain(CCD_Filename, std::ifstream::in);
while(!datain.eof())
{
std::getline(datain, inputline);
//Processing
};
};
All variables which are not defined here are defined somewhere else in my code, and it is working. So is there a possibility to speed this code a bit more up?
Thank you very much!
Some experiment:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <Windows.h>
void generateFiles(int n) {
char fileName[32];
char fileStr[1032];
for (int i=0;i<n;i++) {
sprintf( fileName, "c:\\t\\%i.txt", i );
FILE * f = fopen( fileName, "w" );
for (int j=0;j<256;j++) {
int lineLen = rand() % 1024;
memset(fileStr, 'X', lineLen );
fileStr[lineLen] = 0x0D;
fileStr[lineLen+1] = 0x0A;
fileStr[lineLen+2] = 0x00;
fwrite( fileStr, 1, lineLen+2, f );
}
fclose(f);
}
}
void readFiles(int n) {
char fileName[32];
for (int i=0;i<n;i++) {
sprintf( fileName, "c:\\t\\%i.txt", i );
FILE * f = fopen( fileName, "r" );
fseek(f, 0L, SEEK_END);
int size = ftell(f);
fseek(f, 0L, SEEK_SET);
char * data = (char*)malloc(size);
fread(data, size, 1, f);
free(data);
fclose(f);
}
}
DWORD WINAPI readInThread( LPVOID lpParam )
{
int * number = (int *)lpParam;
char fileName[32];
sprintf( fileName, "c:\\t\\%i.txt", *number );
FILE * f = fopen( fileName, "r" );
fseek(f, 0L, SEEK_END);
int size = ftell(f);
fseek(f, 0L, SEEK_SET);
char * data = (char*)malloc(size);
fread(data, size, 1, f);
free(data);
fclose(f);
return 0;
}
int main(int argc, char ** argv) {
long t1 = GetTickCount();
generateFiles(256);
printf("Write: %li ms\n", GetTickCount() - t1 );
t1 = GetTickCount();
readFiles(256);
printf("Read: %li ms\n", GetTickCount() - t1 );
t1 = GetTickCount();
const int MAX_THREADS = 256;
int pDataArray[MAX_THREADS];
DWORD dwThreadIdArray[MAX_THREADS];
HANDLE hThreadArray[MAX_THREADS];
for( int i=0; i<MAX_THREADS; i++ )
{
pDataArray[i] = (int) HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY,
sizeof(int));
pDataArray[i] = i;
hThreadArray[i] = CreateThread(
NULL,
0,
readInThread,
&pDataArray[i],
0,
&dwThreadIdArray[i]);
}
WaitForMultipleObjects(MAX_THREADS, hThreadArray, TRUE, INFINITE);
printf("Read (threaded): %li ms\n", GetTickCount() - t1 );
}
first function just ugly thing to make a test dataset ( I know it can be done much better, but I honestly have no time )
1st experiment - sequential read
2nd experiment - read all in parallel
results:
256 files:
Write: 250 ms
Read: 140 ms
Read (threaded): 78 ms
1024 files:
Write: 1250 ms
Read: 547 ms
Read (threaded): 843 ms
I think second attempt clearly shows that on a long run 'dumb' thread creation just makes things even worse. Of course it needs improvements in a sense of preallocated workers, some thread pool etc, but I think with such fast operation as reading 100-200k from disk there is no really benefit of moving this functionality into thread. I have no time to write more 'clever' solution, but I have my doubts that it will be much faster because you will have to add system calls for mutexes etc...
going extreme you could think of preallocating memory pools etc.. but as being mentioned before code your posted just wrong.. it's a matter of milliseconds, but for sure not seconds
800 files (20 chars per line, 256 lines)
Write: 250 ms
Read: 63 ms
Read (threaded): 500 ms
Conclusion:
ANSWER IS:
Your reading code is wrong, you reading files so slow that there is a significant increase in speed then you make tasks runs in parallel. In the code above reading is actually faster then an expenses to spawn a thread
Your primary bottleneck is physically reading from the hard disk.
Unless you have the files on separate drives, the drive can only read data from one file at a time. Your best bet is to read each file as a whole rather read a portion of one file, tell the drive to locate to another file, read from there, and repeat. Repositioning the drive head to other locations, especially other files, is usually more expensive than letting the drive finish reading the single file.
The next bottle neck is the data channel between the processor and the hard drive. If your hard drives share any kind of communications channel, you will see a bottleneck, as data from each drive must come through the communications channel to your processor. Your processor will be sending commands to the drive(s) through this communications channel (PATA, SATA, USB, etc.).
The objective of the next steps is to reduce the overhead of the "middle men" between your program's memory and the hard drive communications interface. The most efficient is to access the controller directly; lesser efficient are using the OS functions; the "C" functions (fread and familiy) and least is the C++ streams. With increased efficiency comes tighter coupling with the platform and reduced safety (and simplicity).
I suggest the following:
Create multiple buffers in memory, large enough to save time, small
enough to prevent the OS from paging the memory to the hard drive.
Create a thread that reads the files into memory, as necessary.
Search the web for "double buffering". As long as there is space in
the buffer, this thread will read data.
Create multiple "outgoing" buffers.
Create a second thread that removes data from memory and "processes"
it, and inserts into the "outgoing" buffers.
Create a third thread that takes the data in the "outgoing" buffers
and sends to the databases.
Adjust the size of the buffers for the best efficiency within the
limitations of memory.
If you can access the DMA channels, use them to read from the hard drive into the "read buffers".
Next, you can optimize your code to efficiently use the data cache of the processor. For example, set up your "processing" so the data structures to not exceed a data line in the cache. Also, optimize your code to use registers (either specify the register keyword or use statement blocks so that the compiler knows when variables can be reused).
Other optimizations that may help:
Align data to the processors native word size, pad if necessary. For
example, prefer using 32 bytes instead of 13 or 24.
Fetch data in quantities of the processor's word size. For example,
access 4 octets (bytes) at a time on a 32-bit processor rather than 4
accesses of 1 byte.
Unroll loops - more instructions inside the loop, as branch
instructions slow down processing.
You are probably hitting the read limit of your disks, which means your options are somewhat limited. If this is a constant problem you could consider a different RAID structure, which will give you greater read throughput because more than one read head can access data at the same time.
To see if disk access really is the bottleneck, run your program with the time command:
>> /usr/bin/time -v <my program>
In the output you'll see how much CPU time you were utilizing compared to the amount of time required for things like disk access.
I would try going with C code for reading the file. I suspect that it'll be faster.
FILE* f = ::fopen( CCD_Filename.c_str(), "rb" );
if( f == NULL )
{
return;
}
::fseek( f, 0, SEEK_END );
const long lFileBytes = ::ftell( f );
::fseek( f, 0, SEEK_SET );
char* fileContents = new char[lFileBytes + 1];
const size_t numObjectsRead = ::fread( fileContents, lFileBytes, 1, f );
::fclose( f );
if( numObjectsRead < 1 )
{
delete [] fileContents;
return;
}
fileContents[lFileBytes] = '\0';
// assign char buffer of file contents here
delete [] fileContents;
Recently I decided to optimize some file reading I was doing, because as everyone says, reading a large chunk of data to a buffer and then working with it is faster than using lots of small reads. And my code certainly is much faster now, but after doing some profiling it appears memcpy is taking up a lot of time.
The gist of my code is...
ifstream file("some huge file");
char buffer[0x1000000];
for (yada yada) {
int size = some arbitrary size usually around a megabyte;
file.read(buffer, size);
//Do stuff with buffer
}
I'm using Visual Studio 11 and after profiling my code it says ifstream::read() eventually calls xsgetn() which copies from the internal buffer to my buffer. This operation takes up over 80% of the time! In second place comes uflow() which takes up 10% of the time.
Is there any way I can get around this copying? Can I somehow tell the ifstream to buffer the size I need directly into my buffer? Does the C-style FILE* also use such an internal buffer?
UPDATE: Due to people telling me to use cstdio... I have done a benchmark.
EDIT: Unfortunately the old code was full of fail (it wasn't even reading the entire file!). You can see it here: http://pastebin.com/4dGEQ6S7
Here's my new benchmark:
const int MAX = 0x10000;
char buf[MAX];
string fpath = "largefile";
int main() {
{
clock_t start = clock();
ifstream file(fpath, ios::binary);
while (!file.eof()) {
file.read(buf, MAX);
}
clock_t end = clock();
cout << end-start << endl;
}
{
clock_t start = clock();
FILE* file = fopen(fpath.c_str(), "rb");
setvbuf(file, NULL, _IOFBF, 1024);
while (!feof(file)) {
fread(buf, 0x1, MAX, file);
}
fclose(file);
clock_t end = clock();
cout << end-start << endl;
}
{
clock_t start = clock();
HANDLE file = CreateFile(fpath.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_ALWAYS, NULL, NULL);
while (true) {
DWORD used;
ReadFile(file, buf, MAX, &used, NULL);
if (used < MAX) break;
}
CloseHandle(file);
clock_t end = clock();
cout << end-start << endl;
}
system("PAUSE");
}
Times are:
185
80
78
Well... looks like using the C-style fread is faster than ifstream::read. As well, using the windows ReadFile gives only a slight advantage which is negligible (I looked at the code and fread basically is a wrapper around ReadFile). Looks like I'll be switching to fread after all.
Man it is confusing to write a benchmark which actually tests this stuff correctly.
CONCLUSION: Using <cstdio> is faster than <fstream>. The reason fstream is slower is because c++ streams have their own internal buffer. This results in extra copying whenever you read/write and this copying accounts for the entire extra time taken by fstream. Even more shocking is that the extra time taken is longer than the time taken to actually read the file.
Can I somehow tell the ifstream to buffer the size I need directly
into my buffer?
Yes, this is what pubsetbuf() is for.
But if you're that concerned with copying whlie reading a file, consider memory mapping as well, boost has a portable implementation.
If you want to speed up file I/O I suggest you to use the good ol' <cstdio> because it can outperform the C++ one by a large margin.
It has been proven several times that the fastest way of reading data is mmap() on linux systems. I don't know about Windows. However it for sure will do without this buffering.
fopen(), fread(), fwrite() (FILE*) is somewhat higher-level and may induce a buffer, while open(), read(), write() functions are low level and the only buffer you may have there come from the Os kernel.