I am working on a project which requires to generate a huge amount of data (i.e hundreds of gigabytes) and process them later on. Currently, I am using C++ and I'd like to use mmap() to create some virtual memory to keep the data. The obvious way to do it is to create a file and use mmap() to map let's say 100GB to the file, then we should be able to read/write data from the mapped memory. Something like this:
int file = open("disk_cache", O_RDWR|O_CREAT, 0644);
lseek (file,//100GB, SEEK_SET);
write (file, "", 1);
auto* mapPtr = (unsigned*)mmap(nullptr, //100GB, PROT_WRITE|PROT_READ, MAP_SHARED, file, 0);
if ( mapPtr == MAP_FAILED) {
perror("mmap");
return ;
}
Generate_data(mapPtr); \\ write/generate data here;
Process_data(mapPtr); \\ read/access data here;
if (munmap(mapPtr, //100GB) == -1) {
perror("mmap:");
return ;
}
close(file);
if( remove("disk_cache") != 0 ){
perror("remove the file:");
}
The above code works fine, but it just looks a bit redundant that I have to create a file and remove it later. I am thinking whether I can MAP_ANON flag to map the data, so I tried somthing like this:
auto* mapPtr = (unsigned*)mmap(nullptr, //100GB , PROT_WRITE|PROT_READ, MAP_SHARED|MAP_ANON|MAP_NORESERVE, -1, 0);
if ( mapPtr == MAP_FAILED) {
perror("mmap");
return ;
}
It works fine until the data exceed 64 GB (since I am running this on a server which has 64 GB Mem), My questions are as follows:
(1)It seems like MAP_ANON keeps all the data on Mem, not disk? I wonder why this happened? is there any way to avoid it? I thought MAP_ANON would keep all the data in dev/zero, we can safely store/access data from there.
(2)Also, I am using multiple threads(i.e. 32 cores)to generate/access data. For mmap(), random accessing data is very efficient. But storing/writing data to the memory seems to be slower. I did some quick experiments, it seems like ofstream is faster than mmap() especially when the data is larger than Mem. I feel like maybe I did something wrong with mmap(). Are there any things I need to take care of when using mmap() writing/storing data that are larger than Mem? What's the best way to do it?
Any helps would be appreciated, thanks in advance.
Related
Up to this point, I used to decrypt files (located on an USB stick) with AES as follows:
FILE * fp = fopen(filePath, "r");
vector<char> encryptedChars;
if (fp == NULL) {
//Could not open file
continue;
}
while(true) {
int nextEncryptedChar = fgetc(fp);
if (nextEncryptedChar == EOF) {
break;
}
encryptedChars.push_back(nextEncryptedChar);
}
fclose(fp);
char encryptedFileArray[encryptedChars.size()];
int encryptedByteCount = encryptedChars.size();
for (int x = 0; x < aantalChars; x++) {
encryptedFileArray[x] = encryptedChars[x];
}
encryptedChars.clear();
AES aes;
//Decrypt the message in-place
aes.setup(key, AES::KEY_128, AES::MODE_CBC, iv);
aes.decrypt(encryptedFileArray, sizeof(encryptedFileArray));
aes.clear();
This works perfectly for small files. At this point, I am opening a file from a USB stick and storing all characters into a vector and copying the vector to an array. I know that &encryptedChars[0] can be used as an array pointer as well and will save some memory.
Now I want to decrypt a file of 256Kb (as opposed to 1Kb). Copying the data into a source array will require at least 256Kb of RAM. I however only have 100Kb at my disposal and therefore, cannot create a source array containing the encrypted data.
So I tried to use the FILE * that fopen gives me as a FILE pointer, and created a new file on the same USB stick as a destination pointer. I was hoping that the decryption rounds would use the memory of the USB stick as opposed to available memory on the heap.
FILE * fp = fopen(encryptedFilePath, "r");
FILE * fpDecrypt = fopen(decryptedFilePath, "w+");
if (fp == NULL || fpDecrypt == NULL) {
//Could not open file!?
return;
}
AES aes;
//Decrypt the message in-place
aes.setup(key, AES::KEY_128, AES::MODE_CBC, iv);
aes.decrypt((const char*)fp, fpDecrypt, firmwareSize);
aes.clear();
Unfortunately, the system locks up (no idea why).
Does anybody know if I can pass a FILE * to a function that expects a const char * as source and a void * as a destination?
I am using the following library: https://os.mbed.com/users/neilt6/code/AES/docs/tip/AES_8h_source.html
Thanks!
A lot of crypto libraries provide "incremental" APIs that allow a stream of data to be en/decrypted piece by piece, without having to load the stream into memory. Unfortunately, it appears that the library you're using doesn't (or, at least, does not explicitly document it).
However, if you know how CBC mode encryption works, it's possible to roll your own. Basically, all you need to do is take the last AES block (i.e. the last 16 bytes) of the previous chunk of ciphertext and use it as the IV when decrypting (or encrypting) the next block, something like this:
char buffer[1024]; // this needs to be a multiple of 16 bytes!
char ivTemp[16];
while(true) {
int bytesRead = fread(buffer, 1, sizeof(buffer), inputFile);
// save last 16 bytes of ciphertext as IV for next block
if (bytesRead == sizeof(buffer)) memcpy(ivTemp, buffer + bytesRead - 16, 16);
// decrypt the message in-place
AES aes;
aes.setup(key, AES::KEY_128, AES::MODE_CBC, iv);
aes.decrypt(buffer, bytesRead);
aes.clear();
// write out decrypted data (todo: check for write errors!)
fwrite(buffer, 1, bytesRead, outputFile);
// use the saved last 16 bytes of ciphertext as IV for next block
if (bytesRead == sizeof(buffer)) memcpy(iv, ivTemp, 16);
if (bytesRead < sizeof(buffer)) break; // end of file (or read error)
}
Note that this code will overwrite the iv array. That should be OK, though, since you should never use the same IV twice anyway. (In fact, with CBC mode, the IV should be chosen by the encryptor at random, using a cryptographically secure RNG, and sent alongside the message. The usual way to do that is to simply prepend the IV to the message file.)
Also, the code above is somewhat less efficient than it needs to be, since it calls aes.setup() and thus re-runs the whole AES key expansion for each chunk. Unfortunately, I couldn't find any documented way to tell your crypto library to change the IV without re-running the setup.
However, looking at the implementation of your library, as linked by Sister Fister in the comments below, it looks like it's already replacing its internal copy of the IV with the last ciphertext block. Thus, it looks like all you really need to do is call aes.decrypt() for each block without a setup call in between, something like this:
char buffer[1024]; // this needs to be a multiple of 16 bytes!
AES aes;
aes.setup(key, AES::KEY_128, AES::MODE_CBC, iv);
while(true) {
int bytesRead = fread(buffer, 1, sizeof(buffer), inputFile);
// decrypt the chunk of data in-place (continuing from previous chunk)
aes.decrypt(buffer, bytesRead);
// write out decrypted data (todo: check for write errors!)
fwrite(buffer, 1, bytesRead, outputFile);
if (bytesRead < sizeof(buffer)) break; // end of file (or read error)
}
aes.clear();
Note that this code is relying on a feature of the crypto library that does not seem to be explicitly documented, namely that calling aes.decrypt() multiple times will cause the decryptions to be chained correctly. (That's actually a pretty reasonable thing to do, for CBC mode, but you can never be sure without reading the code or finding explicit documentation saying so.) You should make sure to have a comprehensive test suite for this, and to re-run the tests whenever you upgrade the library.
Also note that I haven't tested either of these examples, so there obviously could be bugs or typos. Also, the docs for your crypto library are somewhat sparse, so it's possible that it might not work exactly like I'm assuming it does. Please test anything based on this code throughly before using it!
In general, if something doesn't fit to memory, you can resort to:
Random accessing files. Use fseek to find the position and read or write what you need. Memory requirement minimal.
Processing in batches that will fit in to memory. Memory requirement is adjustable, but the algorithm must be suitable for this.
System virtual memory, which allows you to reserve as big blocks as your system can address, you have free disk space and your system settings. This is usually transparent depending on your system.
Other paged memory mechanisms.
Since AES encryption is made in blocks of 128 bits, and you're short of memory, you should probably use random access on your file.
I am writing a program to check whether a file is PE file or not. For that, I need to read only the file headers of files(which I guess do not occupy more than first 1024 bytes of a file).
I tried using creatfile() + readfile() combination which turns out be slower because I am iterating through all the files in system drive. It is taking 15-20 minutes just to iterate through them.
Can you please tell some alternate approach to open and read the files to make it faster?
Note : Please note that I do NOT need to read the file in whole. I just need to read the initial part of the file -- DOS header, PE header etc which I guess do not occupy more than first 512 bytes of the file.
Here is my code :
bool IsPEFile(const String filePath)
{
HANDLE hFile = CreateFile(filePath.c_str(),
GENERIC_READ,
FILE_SHARE_READ | FILE_SHARE_WRITE,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL);
DWORD dwBytesRead = 0;
const DWORD CHUNK_SIZE = 2048;
BYTE szBuffer[CHUNK_SIZE] = {0};
LONGLONG size;
LARGE_INTEGER li = {0};
if (hFile != INVALID_HANDLE_VALUE)
{
if(GetFileSizeEx(hFile, &li) && li.QuadPart > 0)
{
size = li.QuadPart;
ReadFile(hFile, szBuffer, CHUNK_SIZE, &dwBytesRead, NULL);
if(dwBytesRead > 0 && (WORDPTR(szBuffer[0]) == ('M' << 8) + 'Z' || WORDPTR(szBuffer[0]) == ('Z' << 8) + 'M'))
{
LONGLONG ne_pe_header = DWORDPTR(szBuffer[0x3c]);
WORD signature = 0;
if(ne_pe_header <= dwBytesRead-2)
{
signature = WORDPTR(szBuffer[ne_pe_header]);
}
else if (ne_pe_header < size )
{
SetFilePointer(hFile, ne_pe_header, NULL, FILE_BEGIN);
ReadFile(hFile, &signature, sizeof(signature), &dwBytesRead, NULL);
if (dwBytesRead != sizeof(signature))
{
return false;
}
}
if(signature == 0x4550) // PE file
{
return true;
}
}
}
CloseHandle(hFile);
}
return false;
}
Thanks in advance.
I think you're hitting the inherent limitations of mechanical hard disk drives. You didn't mention whether you're using a HDD or a solid-state disk, but I assume a HDD given that your file accesses are slow.
HDDs can read data at about 100 MB/s sequentially, but seek time is a bit over 10 ms. This means that if you seek to a certain location (10 ms), you might as well read a megabyte of data (another 10 ms). This also means that you can access only less than 100 files per second.
So, in your case it doesn't matter much whether you're reading the first 512 bytes of a file or the first hundred kilobytes of a file.
Hardware is cheap, programmer time is expensive. Your best bet is to purchase a solid-state disk drive if your file accesses are too slow. I predict that eventually all computers will have solid-state disk drives.
Note: if the bottleneck is the HDD, there is nothing you can do about it other than to replace the HDD with better technology. Practically all file access mechanisms are equally slow. The only thing you can do about it is to read only the initial part of a file if the file is really really large such as multiple megabytes. But based on your code example you're already doing that.
For faster file IO, you need to use CreateFile and ReadFile APIs of Win32.
If you want to speed up, you can use file buffering and make file non-blocking by using overlapped IO or IOCP.
See this example for help: https://msdn.microsoft.com/en-us/library/windows/desktop/bb540534%28v=vs.85%29.aspx
And I think that FILE and fstream of C and C++ respectively are not faster than Win32.
I have a complex interpreter reading in commands from (sometimes) multiples files (the exact details are out of scope) but it requires iterating over these multiple files (some could be GB is size, preventing nice buffering) multiple times.
I am looking to increase the speed of reading in each command from a file.
I have used the RDTSC (program counter) register to micro benchmark the code enough to know about >80% of the time is spent reading in from the files.
Here is the thing: the program that generates the input file is literally faster than to read in the file in my small interpreter. i.e. instead of outputting the file i could (in theory) just link the generator of the data to the interpreter and skip the file but that shouldn't be faster, right?
What am I doing wrong? Or is writing suppose to be 2x to 3x (at least) faster than reading from a file?
I have considered mmap but some of the results on http://lemire.me/blog/archives/2012/06/26/which-is-fastest-read-fread-ifstream-or-mmap/ appear to indicate it is no faster than ifstream. or would mmap help in this case?
details:
I have (so far) tried adding a buffer, tweaking parameters, removing the ifstream buffer (that slowed it down by 6x in my test case), i am currently at a loss for ideas after searching around.
The important section of the code is below. It does the following:
if data is left in buffer, copy form buffer to memblock (where it is then used)
if data is not left in the buffer, check to see how much data is left in the file, if more than the size of the buffer, copy a buffer sized chunk
if less than the file
//if data in buffer
if(leftInBuffer[activefile] > 0)
{
//cout <<bufferloc[activefile] <<"\n";
memcpy(memblock,(buffer[activefile])+bufferloc[activefile],16);
bufferloc[activefile]+=16;
leftInBuffer[activefile]-=16;
}
else //buffers blank
{
//read in block
long blockleft = (cfilemax -cfileplace) / 16 ;
int read=0;
/* slow block starts here */
if(blockleft >= MAXBUFELEMENTS)
{
currentFile->read((char *)(&(buffer[activefile][0])),16*MAXBUFELEMENTS);
leftInBuffer[activefile] = 16*MAXBUFELEMENTS;
bufferloc[activefile]=0;
read =16*MAXBUFELEMENTS;
}
else //read in part of the block
{
currentFile->read((char *)(&(buffer[activefile][0])),16*(blockleft));
leftInBuffer[activefile] = 16*blockleft;
bufferloc[activefile]=0;
read =16*blockleft;
}
/* slow block ends here */
memcpy(memblock,(buffer[activefile])+bufferloc[activefile],16);
bufferloc[activefile]+=16;
leftInBuffer[activefile]-=16;
}
edit: this is on a mac, osx 10.9.5, with an i7 with a SSD
Solution:
as was suggested below, mmap was able to increase the speed by about 10x.
(for anyone else who searches for this)
specifically open with:
uint8_t * openMMap(string name, long & size)
{
int m_fd;
struct stat statbuf;
uint8_t * m_ptr_begin;
if ((m_fd = open(name.c_str(), O_RDONLY)) < 0)
{
perror("can't open file for reading");
}
if (fstat(m_fd, &statbuf) < 0)
{
perror("fstat in openMMap failed");
}
if ((m_ptr_begin = (uint8_t *)mmap(0, statbuf.st_size, PROT_READ, MAP_SHARED, m_fd, 0)) == MAP_FAILED)
{
perror("mmap in openMMap failed");
}
uint8_t * m_ptr = m_ptr_begin;
size = statbuf.st_size;
return m_ptr;
}
read by:
uint8_t * mmfile = openMMap("my_file", length);
uint32_t * memblockmm;
memblockmm = (uint32_t *)mmfile; //cast file to uint32 array
uint32_t data = memblockmm[0]; //take int
mmfile +=4; //increment by 4 as I read a 32 bit entry and each entry in mmfile is 8 bits.
This should be a comment, but I don't have 50 reputation to make a comment.
What is the value of MAXBUFELEMENTS? From my experience, many smaller reads is far slower than one read of larger size. I suggest to read the entire file in if possible, some files could be GBs, but even reading in 100MB at once would perform better than reading 1 MB 100 times.
If that's still not good enough, next thing you can try is to compress(zlib) input files(may have to break them into chunks due to size), and decompress them in memory. This method is usually faster than reading in uncompressed files.
As #Tony Jiang said, try experimenting with the buffer size to see if that helps.
Try mmap to see if that helps.
I assume that currentFile is a std::ifstream? There's going to be some overhead for using iostreams (for example, an istream will do its own buffering, adding an extra layer to what you're doing); although I wouldn't expect the overhead to be huge, you can test by using open(2) and read(2) directly.
You should be able to run your code through dtruss -e to verify how long the read system calls take. If those take the bulk of your time, then you're hitting OS and hardware limits, so you can address that by piping, mmap'ing, or adjusting your buffer size. If those take less time than you expect, then look for problems in your application logic (unnecessary work on each iteration, etc.).
I have a program that basically does this:
Opens some binary file
Reads the file backwards (by backwards, I mean it starts near EOF, and ends reading at beginning of file, i.e. reads the file "right-to-left"), using 4MB chunks
Closes the file
My question is: why memory consumption looks like below, even though there are no obvious memory leaks in my attached code?
Here's the source of program that was run to obtain above image:
#include <stdio.h>
#include <string.h>
int main(void)
{
//allocate stuff
const int bufferSize = 4*1024*1024;
FILE *fileHandle = fopen("./input.txt", "rb");
if (!fileHandle)
{
fprintf(stderr, "No file for you\n");
return 1;
}
unsigned char *buffer = new unsigned char[bufferSize];
if (!buffer)
{
fprintf(stderr, "No buffer for you\n");
return 1;
}
//get file size. file can be BIG, hence the fseeko() and ftello()
//instead of fseek() and ftell().
fseeko(fileHandle, 0, SEEK_END);
off_t totalSize = ftello(fileHandle);
fseeko(fileHandle, 0, SEEK_SET);
//read the file... in reverse order. This is important.
for (off_t pos = totalSize - bufferSize, j = 0;
pos >= 0;
pos -= bufferSize, j ++)
{
if (j % 10 == 0)
{
fprintf(stderr,
"reading like crazy: %lld / %lld\n",
pos, totalSize);
}
/*
* below is the heart of the problem. see notes below
*/
//seek to desired position
fseeko(fileHandle, pos, SEEK_SET);
//read the chunk
fread(buffer, sizeof(unsigned char), bufferSize, fileHandle);
}
fclose(fileHandle);
delete []buffer;
}
I have also following observations:
Even though RAM usage jumps by 1GB, the whole program uses only 5MB thorough whole execution.
Commenting call to fread() out makes memory leak go away. This is weird, since I don't allocate anything anywhere near it, that could trigger memory leak...
Also, reading the file normally instead of backwards (= commenting call to fseeko() out), makes memory leak go away as well. This is the ultra-weird part.
Further information...
Following doesn't help:
Checking results of fread() - yields nothing out of ordinary.
Switching to normal, 32-bit fseek and ftell.
Doing stuff like setbuf(fileHandle, NULL).
Doing stuff like setvbuf(fileHandle, NULL, _IONBF, *any integer*).
Compiled with g++ 4.5.3 on Windows 7 via cygwin and mingw; without any optimalizations, just g++ test.cpp -o test. Both present such behaviour.
The file used in tests was 4GB long, full of zeros.
The weird pause in the middle of the chart could be explained with some kind of temporary I/O hangup, unrelated to this question.
Finally, if I wrap reading in infinite loop... the memory usage stops increasing after first iteration.
I think it has to do with some kind of internal cache building up till it's filled with whole file. How does it really work behind the scenes? How can I prevent that in a portable way??
I think, this is more an OS issue (or even an OS resource use reporting issue) than an issue with your program. Of course, it only uses 5 MB of memory: 1 MB for itself (libs, stack etc.) and 4 MB for the buffer. Whenever you do a fread(), the OS seems to "bind" part of the file to your process, and seems to release it not at the same speed. As memory use on your machine is low, this is not a problem: The OS just keeps the already read data "hanging around" longer than necessary, probably assuming, that your application might read it again, soon, and then it doesn't have to do that binding again.
If memory pressure was higher, than the OS is very likely to unbind the memory faster, so that jump on your memory usage history would be smaller.
I had the exact same problem, although in Java but it doesn't matter in this context. I solved it by reading much bigger chunks at a time. I also read 4Mb size chunks, but when I increased it to 100-200 Mb the problem went away. Perhaps it'll do that for you as well. I'm on Windows 7.
I have to write a program in C (or C++) in Linux that will tests write and read speed on different file systems. I have to be sure that all data are written to the disk (not in cache).
So my first question - what function should I use to open a new file? I used before open function with parameters O_DIRECT and O_SYNC and everything was fine except one thing - writing small files like 1KB was extremely slow, something like 0.01MB/s.
I tried to use fopen function instead open, and fflush function to be sure that all data writes direct to the disk, and I tested it first on FAT32 file system. 1000 files with 1KB was written to disk (here SD card) in 5 sec. something like 0.18MB/s, and I think that is correct.
Now the problem occurs when testing EXT4 and NTFS file systems. On EXT4. 1KB files was written something like 12MB/s (wrong), when testing 100KB transfer was 180MB/s (terribly wrong, my SD card has transfer rate only 20MB/s).
My actually code for write files looks like this:
clock_gettime(CLOCK_REALTIME, &ts);
for ( int i = 0; i < amount; ++i)
{
p = fopen(buffer2, "w+");
fwrite(buff, size*1024, 1, p);
if ( fflush(p) != 0 ) { cout << "fflush error"; return 0; }
fclose(p);
}
clock_gettime(CLOCK_REALTIME, &ts2);
time2 = diff2(ts,ts2);
works only good for FAT32 file system. The second code (used before) looks like this:
for ( int i = 0; i < amount; ++i)
{
int fd = open(buffer2, O_WRONLY | O_CREAT, 0777);
if ( error(fd, "open") ) return false;
if ( (write(fd, buff, size*1024)) < 0 ) { perror("write error"); return 0; }
if ( (fsync(fd)) == -1 ) { perror("fsync"); return 0; }
close(fd);
}
works for all file systems but small files writes extremely slow.
Maybe I should use different code for different file system? Any ideas?
EDIT:
I have found why writing small files is slow. It is because of fsync function, and on different file systems it takes different time. I am calling fsync every write, so here is the problem.
Is there any way to call it at the end, when all files are written? Or maybe every few seconds? Does I have to use different thread?
See How do I ensure data is written to disk before closing fstream? but I don't think you can ensure that data is actually on disk rather than in a cache in the disk controller or even in the drive's onboard cache