Testing write/read speed on NTFS, FAT, EXT4

Testing write/read speed on NTFS, FAT, EXT4 - c++

I have to write a program in C (or C++) in Linux that will tests write and read speed on different file systems. I have to be sure that all data are written to the disk (not in cache).
So my first question - what function should I use to open a new file? I used before open function with parameters O_DIRECT and O_SYNC and everything was fine except one thing - writing small files like 1KB was extremely slow, something like 0.01MB/s.
I tried to use fopen function instead open, and fflush function to be sure that all data writes direct to the disk, and I tested it first on FAT32 file system. 1000 files with 1KB was written to disk (here SD card) in 5 sec. something like 0.18MB/s, and I think that is correct.
Now the problem occurs when testing EXT4 and NTFS file systems. On EXT4. 1KB files was written something like 12MB/s (wrong), when testing 100KB transfer was 180MB/s (terribly wrong, my SD card has transfer rate only 20MB/s).
My actually code for write files looks like this:
clock_gettime(CLOCK_REALTIME, &ts);
for ( int i = 0; i < amount; ++i)
{
p = fopen(buffer2, "w+");
fwrite(buff, size*1024, 1, p);
if ( fflush(p) != 0 ) { cout << "fflush error"; return 0; }
fclose(p);
}
clock_gettime(CLOCK_REALTIME, &ts2);
time2 = diff2(ts,ts2);
works only good for FAT32 file system. The second code (used before) looks like this:
for ( int i = 0; i < amount; ++i)
{
int fd = open(buffer2, O_WRONLY | O_CREAT, 0777);
if ( error(fd, "open") ) return false;
if ( (write(fd, buff, size*1024)) < 0 ) { perror("write error"); return 0; }
if ( (fsync(fd)) == -1 ) { perror("fsync"); return 0; }
close(fd);
}
works for all file systems but small files writes extremely slow.
Maybe I should use different code for different file system? Any ideas?
EDIT:
I have found why writing small files is slow. It is because of fsync function, and on different file systems it takes different time. I am calling fsync every write, so here is the problem.
Is there any way to call it at the end, when all files are written? Or maybe every few seconds? Does I have to use different thread?

See How do I ensure data is written to disk before closing fstream? but I don't think you can ensure that data is actually on disk rather than in a cache in the disk controller or even in the drive's onboard cache

Related

mmap with map_ANON for saving large amount data

I am working on a project which requires to generate a huge amount of data (i.e hundreds of gigabytes) and process them later on. Currently, I am using C++ and I'd like to use mmap() to create some virtual memory to keep the data. The obvious way to do it is to create a file and use mmap() to map let's say 100GB to the file, then we should be able to read/write data from the mapped memory. Something like this:
int file = open("disk_cache", O_RDWR|O_CREAT, 0644);
lseek (file,//100GB, SEEK_SET);
write (file, "", 1);
auto* mapPtr = (unsigned*)mmap(nullptr, //100GB, PROT_WRITE|PROT_READ, MAP_SHARED, file, 0);
if ( mapPtr == MAP_FAILED) {
perror("mmap");
return ;
}
Generate_data(mapPtr); \\ write/generate data here;
Process_data(mapPtr); \\ read/access data here;
if (munmap(mapPtr, //100GB) == -1) {
perror("mmap:");
return ;
}
close(file);
if( remove("disk_cache") != 0 ){
perror("remove the file:");
}
The above code works fine, but it just looks a bit redundant that I have to create a file and remove it later. I am thinking whether I can MAP_ANON flag to map the data, so I tried somthing like this:
auto* mapPtr = (unsigned*)mmap(nullptr, //100GB , PROT_WRITE|PROT_READ, MAP_SHARED|MAP_ANON|MAP_NORESERVE, -1, 0);
if ( mapPtr == MAP_FAILED) {
perror("mmap");
return ;
}
It works fine until the data exceed 64 GB (since I am running this on a server which has 64 GB Mem), My questions are as follows:
(1)It seems like MAP_ANON keeps all the data on Mem, not disk? I wonder why this happened? is there any way to avoid it? I thought MAP_ANON would keep all the data in dev/zero, we can safely store/access data from there.
(2)Also, I am using multiple threads(i.e. 32 cores)to generate/access data. For mmap(), random accessing data is very efficient. But storing/writing data to the memory seems to be slower. I did some quick experiments, it seems like ofstream is faster than mmap() especially when the data is larger than Mem. I feel like maybe I did something wrong with mmap(). Are there any things I need to take care of when using mmap() writing/storing data that are larger than Mem? What's the best way to do it?
Any helps would be appreciated, thanks in advance.

What is the fastest way to read a file in disk in c++?

I am writing a program to check whether a file is PE file or not. For that, I need to read only the file headers of files(which I guess do not occupy more than first 1024 bytes of a file).
I tried using creatfile() + readfile() combination which turns out be slower because I am iterating through all the files in system drive. It is taking 15-20 minutes just to iterate through them.
Can you please tell some alternate approach to open and read the files to make it faster?
Note : Please note that I do NOT need to read the file in whole. I just need to read the initial part of the file -- DOS header, PE header etc which I guess do not occupy more than first 512 bytes of the file.
Here is my code :
bool IsPEFile(const String filePath)
{
HANDLE hFile = CreateFile(filePath.c_str(),
GENERIC_READ,
FILE_SHARE_READ | FILE_SHARE_WRITE,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL);
DWORD dwBytesRead = 0;
const DWORD CHUNK_SIZE = 2048;
BYTE szBuffer[CHUNK_SIZE] = {0};
LONGLONG size;
LARGE_INTEGER li = {0};
if (hFile != INVALID_HANDLE_VALUE)
{
if(GetFileSizeEx(hFile, &li) && li.QuadPart > 0)
{
size = li.QuadPart;
ReadFile(hFile, szBuffer, CHUNK_SIZE, &dwBytesRead, NULL);
if(dwBytesRead > 0 && (WORDPTR(szBuffer[0]) == ('M' << 8) + 'Z' || WORDPTR(szBuffer[0]) == ('Z' << 8) + 'M'))
{
LONGLONG ne_pe_header = DWORDPTR(szBuffer[0x3c]);
WORD signature = 0;
if(ne_pe_header <= dwBytesRead-2)
{
signature = WORDPTR(szBuffer[ne_pe_header]);
}
else if (ne_pe_header < size )
{
SetFilePointer(hFile, ne_pe_header, NULL, FILE_BEGIN);
ReadFile(hFile, &signature, sizeof(signature), &dwBytesRead, NULL);
if (dwBytesRead != sizeof(signature))
{
return false;
}
}
if(signature == 0x4550) // PE file
{
return true;
}
}
}
CloseHandle(hFile);
}
return false;
}
Thanks in advance.

I think you're hitting the inherent limitations of mechanical hard disk drives. You didn't mention whether you're using a HDD or a solid-state disk, but I assume a HDD given that your file accesses are slow.
HDDs can read data at about 100 MB/s sequentially, but seek time is a bit over 10 ms. This means that if you seek to a certain location (10 ms), you might as well read a megabyte of data (another 10 ms). This also means that you can access only less than 100 files per second.
So, in your case it doesn't matter much whether you're reading the first 512 bytes of a file or the first hundred kilobytes of a file.
Hardware is cheap, programmer time is expensive. Your best bet is to purchase a solid-state disk drive if your file accesses are too slow. I predict that eventually all computers will have solid-state disk drives.
Note: if the bottleneck is the HDD, there is nothing you can do about it other than to replace the HDD with better technology. Practically all file access mechanisms are equally slow. The only thing you can do about it is to read only the initial part of a file if the file is really really large such as multiple megabytes. But based on your code example you're already doing that.

For faster file IO, you need to use CreateFile and ReadFile APIs of Win32.
If you want to speed up, you can use file buffering and make file non-blocking by using overlapped IO or IOCP.
See this example for help: https://msdn.microsoft.com/en-us/library/windows/desktop/bb540534%28v=vs.85%29.aspx
And I think that FILE and fstream of C and C++ respectively are not faster than Win32.

How to optimize c++ binary file reading?

I have a complex interpreter reading in commands from (sometimes) multiples files (the exact details are out of scope) but it requires iterating over these multiple files (some could be GB is size, preventing nice buffering) multiple times.
I am looking to increase the speed of reading in each command from a file.
I have used the RDTSC (program counter) register to micro benchmark the code enough to know about >80% of the time is spent reading in from the files.
Here is the thing: the program that generates the input file is literally faster than to read in the file in my small interpreter. i.e. instead of outputting the file i could (in theory) just link the generator of the data to the interpreter and skip the file but that shouldn't be faster, right?
What am I doing wrong? Or is writing suppose to be 2x to 3x (at least) faster than reading from a file?
I have considered mmap but some of the results on http://lemire.me/blog/archives/2012/06/26/which-is-fastest-read-fread-ifstream-or-mmap/ appear to indicate it is no faster than ifstream. or would mmap help in this case?
details:
I have (so far) tried adding a buffer, tweaking parameters, removing the ifstream buffer (that slowed it down by 6x in my test case), i am currently at a loss for ideas after searching around.
The important section of the code is below. It does the following:
if data is left in buffer, copy form buffer to memblock (where it is then used)
if data is not left in the buffer, check to see how much data is left in the file, if more than the size of the buffer, copy a buffer sized chunk
if less than the file
//if data in buffer
if(leftInBuffer[activefile] > 0)
{
//cout <<bufferloc[activefile] <<"\n";
memcpy(memblock,(buffer[activefile])+bufferloc[activefile],16);
bufferloc[activefile]+=16;
leftInBuffer[activefile]-=16;
}
else //buffers blank
{
//read in block
long blockleft = (cfilemax -cfileplace) / 16 ;
int read=0;
/* slow block starts here */
if(blockleft >= MAXBUFELEMENTS)
{
currentFile->read((char *)(&(buffer[activefile][0])),16*MAXBUFELEMENTS);
leftInBuffer[activefile] = 16*MAXBUFELEMENTS;
bufferloc[activefile]=0;
read =16*MAXBUFELEMENTS;
}
else //read in part of the block
{
currentFile->read((char *)(&(buffer[activefile][0])),16*(blockleft));
leftInBuffer[activefile] = 16*blockleft;
bufferloc[activefile]=0;
read =16*blockleft;
}
/* slow block ends here */
memcpy(memblock,(buffer[activefile])+bufferloc[activefile],16);
bufferloc[activefile]+=16;
leftInBuffer[activefile]-=16;
}
edit: this is on a mac, osx 10.9.5, with an i7 with a SSD
Solution:
as was suggested below, mmap was able to increase the speed by about 10x.
(for anyone else who searches for this)
specifically open with:
uint8_t * openMMap(string name, long & size)
{
int m_fd;
struct stat statbuf;
uint8_t * m_ptr_begin;
if ((m_fd = open(name.c_str(), O_RDONLY)) < 0)
{
perror("can't open file for reading");
}
if (fstat(m_fd, &statbuf) < 0)
{
perror("fstat in openMMap failed");
}
if ((m_ptr_begin = (uint8_t *)mmap(0, statbuf.st_size, PROT_READ, MAP_SHARED, m_fd, 0)) == MAP_FAILED)
{
perror("mmap in openMMap failed");
}
uint8_t * m_ptr = m_ptr_begin;
size = statbuf.st_size;
return m_ptr;
}
read by:
uint8_t * mmfile = openMMap("my_file", length);
uint32_t * memblockmm;
memblockmm = (uint32_t *)mmfile; //cast file to uint32 array
uint32_t data = memblockmm[0]; //take int
mmfile +=4; //increment by 4 as I read a 32 bit entry and each entry in mmfile is 8 bits.

This should be a comment, but I don't have 50 reputation to make a comment.
What is the value of MAXBUFELEMENTS? From my experience, many smaller reads is far slower than one read of larger size. I suggest to read the entire file in if possible, some files could be GBs, but even reading in 100MB at once would perform better than reading 1 MB 100 times.
If that's still not good enough, next thing you can try is to compress(zlib) input files(may have to break them into chunks due to size), and decompress them in memory. This method is usually faster than reading in uncompressed files.

As #Tony Jiang said, try experimenting with the buffer size to see if that helps.
Try mmap to see if that helps.
I assume that currentFile is a std::ifstream? There's going to be some overhead for using iostreams (for example, an istream will do its own buffering, adding an extra layer to what you're doing); although I wouldn't expect the overhead to be huge, you can test by using open(2) and read(2) directly.
You should be able to run your code through dtruss -e to verify how long the read system calls take. If those take the bulk of your time, then you're hitting OS and hardware limits, so you can address that by piping, mmap'ing, or adjusting your buffer size. If those take less time than you expect, then look for problems in your application logic (unnecessary work on each iteration, etc.).

C/C++ best way to send a number of bytes to stdout

Profiling my program and the function print is taking a lot of time to perform. How can I send "raw" byte output directly to stdout instead of using fwrite, and making it faster (need to send all 9bytes in the print() at the same time to the stdout) ?
void print(){
unsigned char temp[9];
temp[0] = matrix[0][0];
temp[1] = matrix[0][1];
temp[2] = matrix[0][2];
temp[3] = matrix[1][0];
temp[4] = matrix[1][1];
temp[5] = matrix[1][2];
temp[6] = matrix[2][0];
temp[7] = matrix[2][1];
temp[8] = matrix[2][2];
fwrite(temp,1,9,stdout);
}
Matrix is defined globally to be a unsigned char matrix[3][3];

IO is not an inexpensive operation. It is, in fact, a blocking operation, meaning that the OS can preempt your process when you call write to allow more CPU-bound processes to run, before the IO device you're writing to completes the operation.
The only lower level function you can use (if you're developing on a *nix machine), is to use the raw write function, but even then your performance will not be that much faster than it is now. Simply put: IO is expensive.

The top rated answer claims that IO is slow.
Here's a quick benchmark with a sufficiently large buffer to take the OS out of the critical performance path, but only if you're willing to receive your output in giant blurps. If latency to first byte is your problem, you need to run in "dribs" mode.
Write 10 million records from a nine byte array
Mint 12 AMD64 on 3GHz CoreDuo under gcc 4.6.1
340ms to /dev/null
710ms to 90MB output file
15254ms to 90MB output file in "dribs" mode
FreeBSD 9 AMD64 on 2.4GHz CoreDuo under clang 3.0
450ms to /dev/null
550ms to 90MB output file on ZFS triple mirror
1150ms to 90MB output file on FFS system drive
22154ms to 90MB output file in "dribs" mode
There's nothing slow about IO if you can afford to buffer properly.
#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#include <string.h>
int main (int argc, char* argv[])
{
int dribs = argc > 1 && 0==strcmp (argv[1], "dribs");
int err;
int i;
enum { BigBuf = 4*1024*1024 };
char* outbuf = malloc (BigBuf);
assert (outbuf != NULL);
err = setvbuf (stdout, outbuf, _IOFBF, BigBuf); // full line buffering
assert (err == 0);
enum { ArraySize = 9 };
char temp[ArraySize];
enum { Count = 10*1000*1000 };
for (i = 0; i < Count; ++i) {
fwrite (temp, 1, ArraySize, stdout);
if (dribs) fflush (stdout);
}
fflush (stdout); // seems to be needed after setting own buffer
fclose (stdout);
if (outbuf) { free (outbuf); outbuf = NULL; }
}

The rawest form of output you can do is the probable the write system call, like this
write (1, matrix, 9);
1 is the file descriptor for standard out (0 is standard in, and 2 is standard error). Your standard out will only write as fast as the one reading it at the other end (i.e. the terminal, or the program you're pipeing into) which might be rather slow.
I'm not 100% sure, but you could try setting non-blocking IO on fd 1 (using fcntl) and hope the OS will buffer it for you until it can be consumed by the other end. It's been a while, but I think it works like this
fcntl (1, F_SETFL, O_NONBLOCK);
YMMV though. Please correct me if I'm wrong on the syntax, as I said, it's been a while.

Perhaps your problem is not that fwrite() is slow, but that it is buffered.
Try calling fflush(stdout) after the fwrite().
This all really depends on your definition of slow in this context.

All printing is fairly slow, although iostreams are really slow for printing.
Your best bet would be to use printf, something along the lines of:
printf("%c%c%c%c%c%c%c%c%c\n", matrix[0][0], matrix[0][1], matrix[0][2], matrix[1][0],
matrix[1][1], matrix[1][2], matrix[2][0], matrix[2][1], matrix[2][2]);

As everyone has pointed out IO in tight inner loop is expensive. I have normally ended up doing conditional cout of Matrix based on some criteria when required to debug it.
If your app is console app then try redirecting it to a file, it will be lot faster than doing console refreshes. e.g app.exe > matrixDump.txt

What's wrong with:
fwrite(matrix,1,9,stdout);
both the one and the two dimensional arrays take up the same memory.

Try running the program twice. Once with output and once without. You will notice that overall, the one without the io is the fastest. Also, you could fork the process (or create a thread), one writing to a file(stdout), and one doing the operations.

So first, don't print on every entry. Basically what i am saying is do not do like that.
for(int i = 0; i<100; i++){
printf("Your stuff");
}
instead allocate a buffer either on stack or on heap, and store you infomration there and then just throw this bufffer into stdout, just liek that
char *buffer = malloc(sizeof(100));
for(int i = 100; i<100; i++){
char[i] = 1; //your 8 byte value goes here
}
//once you are done print it to a ocnsole with
write(1, buffer, 100);
but in your case, just use write(1, temp, 9);

I am pretty sure you can increase the output performance by increasing the buffer size. So you have less fwrite calls. write might be faster but I am not sure. Just try this:
❯ yes | dd of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 2.18338 s, 234 MB/s
vs
> yes | dd of=/dev/null count=100000 bs=50KB iflag=fullblock
100000+0 records in
100000+0 records out
5000000000 bytes (5.0 GB, 4.7 GiB) copied, 2.63986 s, 1.9 GB/s
The same applies to your code. Some tests during the last days show that probably good buffer sizes are around 1 << 12 (=4096) and 1<<16 (=65535) bytes.

You can simply:
std::cout << temp;
printf is more C-Style.
Yet, IO operations are costly, so use them wisely.

Fastest way to create large file in c++?

Create a flat text file in c++ around 50 - 100 MB
with the content 'Added first line' should be inserted in to the file for 4 million times

using old style file io
fopen the file for write.
fseek to the desired file size - 1.
fwrite a single byte
fclose the file

The fastest way to create a file of a certain size is to simply create a zero-length file using creat() or open() and then change the size using chsize(). This will simply allocate blocks on the disk for the file, the contents will be whatever happened to be in those blocks. It's very fast since no buffer writing needs to take place.

Not sure I understand the question. Do you want to ensure that every character in the file is a printable ASCII character? If so, what about this? Fills the file with "abcdefghabc...."
#include <stdio.h>
int main ()
{
const int FILE_SiZE = 50000; //size in KB
const int BUFFER_SIZE = 1024;
char buffer [BUFFER_SIZE + 1];
int i;
for(i = 0; i < BUFFER_SIZE; i++)
buffer[i] = (char)(i%8 + 'a');
buffer[BUFFER_SIZE] = '\0';
FILE *pFile = fopen ("somefile.txt", "w");
for (i = 0; i < FILE_SIZE; i++)
fprintf(pFile, buffer);
fclose(pFile);
return 0;
}

You haven't mentioned the OS but I'll assume creat/open/close/write are available.
For truly efficient writing and assuming, say, a 4k page and disk block size and a repeated string:
open the file.
allocate 4k * number of chars in your repeated string, ideally aligned to a page boundary.
print repeated string into the memory 4k times, filling the blocks precisely.
Use write() to write out the blocks to disk as many times as necessary. You may wish to write a partial piece for the last block to get the size to come out right.
close the file.
This bypasses the buffering of fopen() and friends, which is good and bad: their buffering means that they're nice and fast, but they are still not going to be as efficient as this, which has no overhead of working with the buffer.
This can easily be written in C++ or C, but does assume that you're going to use POSIX calls rather than iostream or stdio for efficiency's sake, so it's outside the core library specification.

I faced the same problem, creating a ~500MB file on Windows very fast.
The larger buffer you pass to fwrite() the fastest you'll be.
int i;
FILE *fp;
fp = fopen(fname,"wb");
if (fp != NULL) {
// create big block's data
uint8_t b[278528]; // some big chunk size
for( i = 0; i < sizeof(b); i++ ) // custom initialization if != 0x00
{
b[i] = 0xFF;
}
// write all blocks to file
for( i = 0; i < TOT_BLOCKS; i++ )
fwrite(&b, sizeof(b), 1, fp);
fclose (fp);
}
Now at least on my Win7, MinGW, creates file almost instantly.
Compared to fwrite() 1 byte at time, that will complete in 10 Secs.
Passing 4k buffer will complete in 2 Secs.

Fastest way to create large file in c++?
Ok. I assume fastest way means the one that takes the smallest run time.
Create a flat text file in c++ around 50 - 100 MB with the content 'Added first line' should be inserted in to the file for 4 million times.
preallocate the file using old style file io
fopen the file for write.
fseek to the desired file size - 1.
fwrite a single byte
fclose the file
create a string containing the "Added first line\n" a thousand times.
find it's length.
preallocate the file using old style file io
fopen the file for write.
fseek to the the string length * 4000
fwrite a single byte
fclose the file
open the file for read/write
loop 4000 times,
writing the string to the file.
close the file.
That's my best guess.
I'm sure there are a lot of ways to do it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Testing write/read speed on NTFS, FAT, EXT4 - c++

See How do I ensure data is written to disk before closing fstream? but I don't think you can ensure that data is actually on disk rather than in a cache in the disk controller or even in the drive's onboard cache

Related

mmap with map_ANON for saving large amount data

What is the fastest way to read a file in disk in c++?

How to optimize c++ binary file reading?

C/C++ best way to send a number of bytes to stdout

Fastest way to create large file in c++?

Categories

Resources