Reading a set of files with no buffering(skipping the file cache) using the flag FILE_FLAG_IO_BUFFERING should be faster than the normal reading(without using this flag). The reason for that to be faster is that 'no buffering' mechanism will skip the system file cache and will directly read into application's buffer.
The application is run in cold environment (after disk defragmentation, machine restart) so that the system file cache is not cached with the concerned files before the run.
This is from msdn documentation on these APIs and flags.
However, I experience a totally different performance behavior. I read a set of files synchronously one after the other, after the file handles have been created using the FILE_FLAG_IO_BUFFERING flag. The time it takes to read the set of files is 29 sec. Where as, if I had read normally without using this flag (again in the cold run of the application when file cache does not hold the concerned files), the time it takes is around 24 secs.
Details:
Total number of files: 1939
Total file size(sum of all): 57 MB
With FLAG_IO_NO_BUFFERING: 29 secs (time taken to read)
Without FLAG_IO_NO_BUFFERING: 24 secs (time taken to read)
Here is the code that implements the read:
DWORD ReadFiles(std::vector<std::string> &filePathNameVectorRef)
{
long totalBytesRead = 0;
for(all file in filePathNameVectorRef)
totalBytesRead += Read_Synchronous(file);
return totalBytesRead;
}
DWORD Read_Synchronous(const char * filePathName)
{
DWORD accessMode = GENERIC_READ;
DWORD shareMode = 0;
DWORD createDisposition = OPEN_EXISTING;
DWORD flags = FILE_FLAG_NO_BUFFERING;
HANDLE handle = INVALID_HANDLE_VALUE;
DWORD fileSize;
DWORD bytesRead = 0;
DWORD bytesToRead = 0;
LARGE_INTEGER li;
char * buffer;
BOOL success = false;
handle = CreateFile(filePathName, accessMode, shareMode, NULL, createDisposition, flags, NULL);
if(handle == INVALID_HANDLE_VALUE)
return 0;
GetFileSizeEx(handle, &li);
fileSize = (DWORD)li.QuadPart;
bytesToRead = (fileSize/g_bytesPerPhysicalSector)*g_bytesPerPhysicalSector;
buffer = static_cast<char *>(VirtualAlloc(0, bytesToRead, MEM_COMMIT, PAGE_READWRITE));
if(buffer == NULL)
goto RETURN;
success = ReadFile(handle, buffer, bytesToRead, &bytesRead, NULL);
if(!success){
fprintf(stdout, "\n Error occured: %d", GetLastError());
return 0;
}
free(buffer);
RETURN:
CloseHandle(handle);
return bytesRead;
}
Please share your thoughts on the reason you think this code is running slower than when the FILE_FLAG_NO_BUFFERING is not used. Thanks.
I expect that what you are measuring is the time to open and close the files. There are rather a lot of files. You should be able to read 57MB from a disk in around one second. So the overhead would appear to be the file opening rather than the reading. You should try again with fewer, but larger, files. Create, say, 20 100MB files and read those. It looks like, on your system at least, that it is slower to open files with FILE_FLAG_NO_BUFFERING than without.
In any case, don't expect FILE_FLAG_NO_BUFFERING to speed things up. The time spent copying from the file handle's buffer to your buffer is trivial in comparison to pulling the data off the disk.
Related
I am writing a program to check whether a file is PE file or not. For that, I need to read only the file headers of files(which I guess do not occupy more than first 1024 bytes of a file).
I tried using creatfile() + readfile() combination which turns out be slower because I am iterating through all the files in system drive. It is taking 15-20 minutes just to iterate through them.
Can you please tell some alternate approach to open and read the files to make it faster?
Note : Please note that I do NOT need to read the file in whole. I just need to read the initial part of the file -- DOS header, PE header etc which I guess do not occupy more than first 512 bytes of the file.
Here is my code :
bool IsPEFile(const String filePath)
{
HANDLE hFile = CreateFile(filePath.c_str(),
GENERIC_READ,
FILE_SHARE_READ | FILE_SHARE_WRITE,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL);
DWORD dwBytesRead = 0;
const DWORD CHUNK_SIZE = 2048;
BYTE szBuffer[CHUNK_SIZE] = {0};
LONGLONG size;
LARGE_INTEGER li = {0};
if (hFile != INVALID_HANDLE_VALUE)
{
if(GetFileSizeEx(hFile, &li) && li.QuadPart > 0)
{
size = li.QuadPart;
ReadFile(hFile, szBuffer, CHUNK_SIZE, &dwBytesRead, NULL);
if(dwBytesRead > 0 && (WORDPTR(szBuffer[0]) == ('M' << 8) + 'Z' || WORDPTR(szBuffer[0]) == ('Z' << 8) + 'M'))
{
LONGLONG ne_pe_header = DWORDPTR(szBuffer[0x3c]);
WORD signature = 0;
if(ne_pe_header <= dwBytesRead-2)
{
signature = WORDPTR(szBuffer[ne_pe_header]);
}
else if (ne_pe_header < size )
{
SetFilePointer(hFile, ne_pe_header, NULL, FILE_BEGIN);
ReadFile(hFile, &signature, sizeof(signature), &dwBytesRead, NULL);
if (dwBytesRead != sizeof(signature))
{
return false;
}
}
if(signature == 0x4550) // PE file
{
return true;
}
}
}
CloseHandle(hFile);
}
return false;
}
Thanks in advance.
I think you're hitting the inherent limitations of mechanical hard disk drives. You didn't mention whether you're using a HDD or a solid-state disk, but I assume a HDD given that your file accesses are slow.
HDDs can read data at about 100 MB/s sequentially, but seek time is a bit over 10 ms. This means that if you seek to a certain location (10 ms), you might as well read a megabyte of data (another 10 ms). This also means that you can access only less than 100 files per second.
So, in your case it doesn't matter much whether you're reading the first 512 bytes of a file or the first hundred kilobytes of a file.
Hardware is cheap, programmer time is expensive. Your best bet is to purchase a solid-state disk drive if your file accesses are too slow. I predict that eventually all computers will have solid-state disk drives.
Note: if the bottleneck is the HDD, there is nothing you can do about it other than to replace the HDD with better technology. Practically all file access mechanisms are equally slow. The only thing you can do about it is to read only the initial part of a file if the file is really really large such as multiple megabytes. But based on your code example you're already doing that.
For faster file IO, you need to use CreateFile and ReadFile APIs of Win32.
If you want to speed up, you can use file buffering and make file non-blocking by using overlapped IO or IOCP.
See this example for help: https://msdn.microsoft.com/en-us/library/windows/desktop/bb540534%28v=vs.85%29.aspx
And I think that FILE and fstream of C and C++ respectively are not faster than Win32.
My laptop has a SSD disk that has 512 byte physical disk sector size and 4,096 byte logical disk sector size. I'm working on an ACID database system that has to bypass all OS caches, so I write directly from allocated internal memory (RAM) to the SSD disk. I also extend the files before I run the tests and don't resize it during the tests.
Now here is my problem, according to SSD benchmarks random read & write should be in the range 30 MB/s to 90 MB/s, respectively. But here is my (rather horrible) telemetry from my numerous perfrmance tests:
1.2 MB/s when reading random 512 byte blocks (physical sector size)
512 KB/s when writing random 512 byte blocks (physical sector size)
8.5 MB/s when reading random 4,096 byte blocks (logical sector size)
4.9 MB/s when writing random 4,096 byte blocks (logical sector size)
In addition to using asynchronous I/O I also set the FILE_SHARE_READ and FILE_SHARE_WRITE flags to disable all OS buffering - because our database is ACID I must do this, I also tried FlushFileBuffers() but that gave me even worse performance. I also wait for each async I/O operation to complete as is required by some of our code.
Here is my code, is there are problem with it or am I stuck with this bad I/O performance?
HANDLE OpenFile(const wchar_t *fileName)
{
// Set access method
DWORD desiredAccess = GENERIC_READ | GENERIC_WRITE ;
// Set file flags
DWORD fileFlags = FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING /*| FILE_FLAG_RANDOM_ACCESS*/;
//File or device is being opened or created for asynchronous I/O
fileFlags |= FILE_FLAG_OVERLAPPED ;
// Exlusive use (no share mode)
DWORD shareMode = 0;
HANDLE hOutputFile = CreateFile(
// File name
fileName,
// Requested access to the file
desiredAccess,
// Share mode. 0 equals exclusive lock by the process
shareMode,
// Pointer to a security attribute structure
NULL,
// Action to take on file
CREATE_NEW,
// File attributes and flags
fileFlags,
// Template file
NULL
);
if (hOutputFile == INVALID_HANDLE_VALUE)
{
int lastError = GetLastError();
std::cerr << "Unable to create the file '" << fileName << "'. [CreateFile] error #" << lastError << "." << std::endl;
}
return hOutputFile;
}
DWORD ReadFromFile(HANDLE hFile, void *outData, _UINT64 bytesToRead, _UINT64 location, OVERLAPPED *overlappedPtr,
asyncIoCompletionRoutine_t completionRoutine)
{
DWORD bytesRead = 0;
if (overlappedPtr)
{
// Windows demand that you split the file byte locttion into high & low 32-bit addresses
overlappedPtr->Offset = (DWORD)_UINT64LO(location);
overlappedPtr->OffsetHigh = (DWORD)_UINT64HI(location);
// Should we use a callback function or a manual event
if (!completionRoutine && !overlappedPtr->hEvent)
{
// No manual event supplied, so create one. The caller must reset and close it themselves
overlappedPtr->hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
if (!overlappedPtr->hEvent)
{
DWORD errNumber = GetLastError();
std::wcerr << L"Could not create a new event. [CreateEvent] error #" << errNumber << L".";
}
}
}
BOOL result = completionRoutine ?
ReadFileEx(hFile, outData, (DWORD)(bytesToRead), overlappedPtr, completionRoutine) :
ReadFile(hFile, outData, (DWORD)(bytesToRead), &bytesRead, overlappedPtr);
if (result == FALSE)
{
DWORD errorCode = GetLastError();
if (errorCode != ERROR_IO_PENDING)
{
std::wcerr << L"Can't read sectors from file. [ReadFile] error #" << errorCode << L".";
}
}
return bytesRead;
}
Random IO performance is not measured well in MB/sec. It is measured in IOPS. "1.2 MB/s when reading random 512 byte blocks" => 20000 IOPS. Not bad. Double the block size and you'll get 199% the MB/sec and 99% the IOPS because it takes almost the same time to read 512 bytes than it does to read 1024 bytes (almost no time at all). SSDs are not free of seeking costs as is sometimes mistakenly assumed.
So the numbers are not actually bad at all.
SSDs benefit from high queue depth. Try issuing multiple IOs at once and keep that number outstanding at all times. The optimal concurrency will be somewhere in the range of 1-32.
Because SSDs have hardware concurrency you can expect a small multiple of the single-threaded performance. My SSD has 4 parallel "banks" for example.
Using FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING is all that is needed to achieve direct writes to hardware. If these flags do not work your hardware does not respect these flags and you can't do anything about it. All server hardware respects these flags and I have not seen a consumer disk that doesn't.
The sharing flags are not meaningful in this context.
The code is fine although I don't see why you use async IO and later wait on an event to wait for completion. That makes no sense. Either use synchronous IO (which will perform about the same as async IO) or use async IO with completion ports and without waiting.
Use hdparm -I /dev/sdx to check your logical and physical block size. Most modern SSDs have a 4096 byte physical block size but also support 512byte blocks for backward compatibility with older drives & OS software. This is done by "512 byte emulation" A.K.A 512e. If your drive is one of the ones that does 512 byte emulation your 512 byte accesses are actually read modify write operations. The SSD will try to turn sequential accesses in to 4k block writes.
If you can switch to 4k block writes you will (probably) see much better numbers for IOPS as well as bandwidth since this makes for much less work on the SSD. Random 512 block writes also have a big impact on long term performance due to increased write amplification.
I'm trying to get the filesize of a large file (12gb+) and I don't want to open the file to do so as I assume this would eat a lot of resources. Is there any good API to do so with? I'm in a Windows environment.
You should call GetFileSizeEx which is easier to use than the older GetFileSize. You will need to open the file by calling CreateFile but that's a cheap operation. Your assumption that opening a file is expensive, even a 12GB file, is false.
You could use the following function to get the job done:
__int64 FileSize(const wchar_t* name)
{
HANDLE hFile = CreateFile(name, GENERIC_READ,
FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL, NULL);
if (hFile==INVALID_HANDLE_VALUE)
return -1; // error condition, could call GetLastError to find out more
LARGE_INTEGER size;
if (!GetFileSizeEx(hFile, &size))
{
CloseHandle(hFile);
return -1; // error condition, could call GetLastError to find out more
}
CloseHandle(hFile);
return size.QuadPart;
}
There are other API calls that will return you the file size without forcing you to create a file handle, notably GetFileAttributesEx. However, it's perfectly plausible that this function will just open the file behind the scenes.
__int64 FileSize(const wchar_t* name)
{
WIN32_FILE_ATTRIBUTE_DATA fad;
if (!GetFileAttributesEx(name, GetFileExInfoStandard, &fad))
return -1; // error condition, could call GetLastError to find out more
LARGE_INTEGER size;
size.HighPart = fad.nFileSizeHigh;
size.LowPart = fad.nFileSizeLow;
return size.QuadPart;
}
If you are compiling with Visual Studio and want to avoid calling Win32 APIs then you can use _wstat64.
Here is a _wstat64 based version of the function:
__int64 FileSize(const wchar_t* name)
{
__stat64 buf;
if (_wstat64(name, &buf) != 0)
return -1; // error, could use errno to find out more
return buf.st_size;
}
If performance ever became an issue for you then you should time the various options on all the platforms that you target in order to reach a decision. Don't assume that the APIs that don't require you to call CreateFile will be faster. They might be but you won't know until you have timed it.
I've also lived with the fear of the price paid for opening a file and closing it just to get its size. And decided to ask the performance counter^ and see how expensive the operations really are.
This is the number of cycles it took to execute 1 file size query on the same file with the three methods. Tested on 2 files: 150 MB and 1.5 GB. Got +/- 10% fluctuations so they don't seem to be affected by actual file size. (obviously this depend on CPU but it gives you a good vantage point)
190 cycles - CreateFile, GetFileSizeEx, CloseHandle
40 cycles - GetFileAttributesEx
150 cycles - FindFirstFile, FindClose
The GIST with the code used^ is available here.
As we can see from this highly scientific :) test, slowest is actually the file opener. 2nd slowest is the file finder while the winner is the attributes reader. Now, in terms of reliability, CreateFile should be preferred over the other 2. But I still don't like the concept of opening a file just to read its size... Unless I'm doing size critical stuff, I'll go for the Attributes.
PS: When I'll have time I'll try to read sizes of files that are opened and am writing to. But not right now...
Another option using the FindFirstFile function
#include "stdafx.h"
#include <windows.h>
#include <tchar.h>
#include <stdio.h>
int _tmain(int argc, _TCHAR* argv[])
{
WIN32_FIND_DATA FindFileData;
HANDLE hFind;
LPCTSTR lpFileName = L"C:\\Foo\\Bar.ext";
hFind = FindFirstFile(lpFileName , &FindFileData);
if (hFind == INVALID_HANDLE_VALUE)
{
printf ("File not found (%d)\n", GetLastError());
return -1;
}
else
{
ULONGLONG FileSize = FindFileData.nFileSizeHigh;
FileSize <<= sizeof( FindFileData.nFileSizeHigh ) * 8;
FileSize |= FindFileData.nFileSizeLow;
_tprintf (TEXT("file size is %u\n"), FileSize);
FindClose(hFind);
}
return 0;
}
As of C++17, there is file_size as part of the standard library. (Then the implementor gets to decide how to do it efficiently!)
What about GetFileSize function?
I prefer not to use XML library parser out there, so can you give me suggestion which good write function to use to write data to XML file? I will make alot of to calls to the write function so the write function should be able to keep track of the last write position and it should not take too much resource. I have two different write below but I can't keep track the last write position unless I have to read the file until end of file.
case#1
FILE *pfile = _tfopen(GetFileNameXML(), _T("w"));
if(pfile)
{
_fputts(TEXT(""), pfile);
}
if(pfile)
{
fclose(pfile);
pfile = NULL;
}
case#2
HANDLE hFile = CreateFile(GetFileNameXML(), GENERIC_READ|GENERIC_WRITE,
FILE_SHARE_WRITE|FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if(hFile != INVALID_HANDLE_VALUE)
{
WriteFile(hFile,,,,,);
}
CloseHandle(hFile);
thanks.
If all you need is to write some text files, use C++'s standard library file facilities. The samples here will be helpful: http://www.cplusplus.com/doc/tutorial/files/
First, what's your aversion to using a standard XML processing library?
Next, if you decide to roll your own, definitely don't go directly at the Win32 APIs - at least not unless you're going to write out the generated XML in large chunks, or you're going to implement your own buffering layer.
It's not going to matter for dealing with tiny files, but you specifically mention good performance and many calls to the write function. WriteFile has a fair amount of overhead, it does a lot of work and involves user->kernel->user mode switches, which are expensive. If you're dealing with "normally sized" XML files you probably won't be able to see much of a difference, but if you're generating monstrously sized dumps it's definitely something to keep in mind.
You mention tracking the last write position - first off, it should be easy... with FILE buffers you have ftell, with raw Win32 API you have SetFilePointerEx - call it with liDistanceToMove=0 and dwMoveMethod=FILE_CURRENT, and you get the current file position after a write. But why do you need this? If you're streaming out an XML file, you should generally keep on streaming until you're done writing - are you closing and re-opening the file? Or are you writing a valid XML file which you want to insert more data into later?
As for the overhead of the Win32 file functions, it may or may not be relevant in your case (depending on the size of the files you're dealing with), but with larger files it matters a lot - included below is a micro-benchmark that simpy reads a file to memory with ReadFile, letting you specify different buffer sizes from the command line. It's interesting to look at, say, Process Explorer's IO tab while running the tool. Here's some statistics from my measly laptop (Win7-SP1 x64, core2duo P7350#2.0GHz, 4GB ram, 120GB Intel-320 SSD).
Take it for what it is, a micro-benchmark. The performance might or might not matter in your particular situation, but I do believe the numbers demonstrate that there's considerable overhead to the Win32 file APIs, and that doing a little buffering of your own helps.
With a fully cached 2GB file:
BlkSz Speed
32 14.4MB/s
64 28.6MB/s
128 56MB/s
256 107MB/s
512 205MB/s
1024 350MB/s
4096 800MB/s
32768 ~2GB/s
With a "so big there will only be cache misses" 4GB file:
BlkSz Speed CPU
32 13MB/s 49%
64 26MB/s 49%
128 52MB/s 49%
256 99MB/s 49%
512 180MB/s 49%
1024 200MB/s 32%
4096 185MB/s 22%
32768 205MB/s 13%
Keep in mind that 49% CPU usage means that one CPU core is pretty much fully pegged - a single thread can't really push the machine much harder. Notice the pathological behavior of the 4kb buffer in the second table - it was reproducible, and I don't have an explanation for it.
Crappy micro-benchmark code goes here:
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <vector>
#include <iostream>
#include <string>
#include <assert.h>
unsigned getDuration(FILETIME& timeStart, FILETIME& timeEnd)
{
// duration is in 100-nanoseconds, we want milliseconds
// 1 millisecond = 1000 microseconds = 1000000 nanoseconds
LARGE_INTEGER ts, te, res;
ts.HighPart = timeStart.dwHighDateTime; ts.LowPart = timeStart.dwLowDateTime;
te.HighPart = timeEnd.dwHighDateTime; te.LowPart = timeEnd.dwLowDateTime;
res.QuadPart = ((te.QuadPart - ts.QuadPart) / 10000);
assert(res.QuadPart < UINT_MAX);
return res.QuadPart;
}
int main(int argc, char* argv[])
{
if(argc < 2) {
puts("Syntax: ReadFile [filename] [blocksize]");
return 0;
}
char *filename= argv[1];
int blockSize = atoi(argv[2]);
if(blockSize < 1) {
puts("Please specify a blocksize larger than 0");
return 1;
}
HANDLE hFile = CreateFile(filename, GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, 0);
if(INVALID_HANDLE_VALUE == hFile) {
puts("error opening input file");
return 1;
}
std::vector<char> buffer(blockSize);
LARGE_INTEGER fileSize;
if(!GetFileSizeEx(hFile, &fileSize)) {
puts("Failed getting file size.");
return 1;
}
std::cout << "File size " << fileSize.QuadPart << ", that's " << (fileSize.QuadPart / blockSize) <<
" blocks of " << blockSize << " bytes - reading..." << std::endl;
FILETIME dummy, kernelStart, userStart;
GetProcessTimes(GetCurrentProcess(), &dummy, &dummy, &kernelStart, &userStart);
DWORD ticks = GetTickCount();
DWORD bytesRead = 0;
do {
if(!ReadFile(hFile, &buffer[0], blockSize, &bytesRead, 0)) {
puts("Error calling ReadFile");
return 1;
}
} while(bytesRead == blockSize);
ticks = GetTickCount() - ticks;
FILETIME kernelEnd, userEnd;
GetProcessTimes(GetCurrentProcess(), &dummy, &dummy, &kernelEnd, &userEnd);
CloseHandle(hFile);
std::cout << "Reading with " << blockSize << " sized blocks took " << ticks << "ms, spending " <<
getDuration(kernelStart, kernelEnd) << "ms in kernel and " <<
getDuration(userStart, userEnd) << "ms in user mode. Hit enter to countinue." << std::endl;
std::string dummyString;
std::cin >> dummyString;
return 0;
}
I want to read a file from hard disk in size up to ~4-5GB. But not whole at once but in parts of ~100MB in sequence. I want to make it simple and fast as possible, but now I see that that the standard methods from C++ will not work for files bigger than 2GB.
I use Visual Studio 2008, C++/CLI. Any suggestions? I try to use CreateFile, ReadFile but for me it makes more problems than really works, or I use them wrong for reading a big file in parts.
EDIT: Sample code:
Creating handle
hFile = CreateFile(result,
GENERIC_READ,
FILE_SHARE_READ,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL
|FILE_FLAG_NO_BUFFERING
| FILE_FLAG_OVERLAPPED,
0);
Reading
lpOverlapped = new OVERLAPPED;
lpOverlapped->hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
lpOverlapped->Offset=10;
lpOverlapped->OffsetHigh=0;
DWORD howMuchWasRead;
BOOLEAN error = false;
do {
this->lastError = NO_ERROR;
BOOL bRet = ReadFile(this->hFile,this->fileBuffer,this->currentBufferSize,&howMuchWasRead,lpOverlapped);
this->lastError = GetLastError();
if (this->lastError == ERROR_IO_PENDING){
while(!HasOverlappedIoCompleted(this->lpOverlapped)){}
error = true;
} else {
error = false;
}
} while (error == true);
This version now returns me ERROR_INVALID_PARAMETER 87 (0x57), for 4GB .iso file, buffer size is 100MB.
You can map parts of the file into the address space of your process using CreateFile, CreateFileMapping and MapViewOfFile.
You can read the file sequentially without any problems.
The limitations is that fseek uses a long parameter for the offset when you want to seek. If you don't reposition in the file, or the offset is always less than 2GB, there is no problem.
ReadFile will handle files larger than 2GB, maybe you can rephrase your question so we can help you figure out the problems you are having with that.