I have to deal with a huge amount of data that usually doesn't fit into main memory. The way I access this data has high locality, so caching parts of it in memory looks like a good option. Is it feasible to just malloc() a huge array, and let the operating system figure out which bits to page out and which bits to keep?
Assuming the data comes from a file, you're better off memory mapping that file. Otherwise, what you end up doing is allocating your array, and then copying the data from your file into the array -- and since your array is mapped to the page file, you're basically just copying the original file to the page file, and in the process polluting the "cache" (i.e., physical memory) so other data that's currently active has a much better chance of being evicted. Then, when you're done you (typically) write the data back from the array to the original file, which (in this case) means copying from the page file back to the original file.
Memory mapping the file instead just creates some address space and maps it directly to the original file instead. This avoids copying data from the original file to the page file (and back again when you're done) as well as temporarily moving data into physical memory on the way from the original file to the page file. The biggest win, of course, is when/if there are substantial pieces of the original file that you never really use at all (in which case they may never be read into physical memory at all, assuming the unused chunk is at least a page in size).
If the data are in a large file, look into using mmap to read it. Modern computers have so much RAM, you might not enough swap space available.
Related
When programming in C and/or C++, how does one set up byte-buffer in-memory structure, such that it can automatically resize as the situation warrants?
Often, I will want to write some unknown quantity bytes to a buffer, without knowing how much space is needed. I feel like this is a fundamental I/O programming concern – and I don’t know how to approach the problem, let alone solve it.
Specifically, I’m doing this I/O to process image data – the sizes can vary from a few kilobytes on up to hundreds of megabytes, depending on compression settings and (many!) other factors.
My current workaround, in many cases, is to:
open() a write-mode descriptor on a temporary file, and write() my indeterminate quantity of bytes to this file;
then call fsync() and subsequently close() the descriptor;
use stat() to get the size of the file;
re-open() the temporary file in read mode;
and then finally read() the entire file back into a newly allocated, properly-sized buffer.
My question, therefore, is a two-parter: one, how problematic is my workaround? and two: how can I accomplish this task using only in-memory structures?
Nothing wrong with your approach as long as you can make sure the file doesn't change size between steps 3 and 5. It is, actually, the solution that has most probably the best performance.
In case you realize (by counting bytes read vs. buffer size) while reading the file that there is more to read but you run out of buffer space, you can always use realloc to increase the buffer by an arbitrary amount. How much that "arbitrary amount" would be depends on the nature of your application and your expected memory situation. If memory is plenty, you might want to over-allocate by factor 1.5 and realloc to the actual size once you have read the complete file.
Dynamically re-allocating the buffer has, however, a bit of a speed penalty and might not always be possible when you are working with huge buffers and are already tight on memory (most implementations of realloc will temporarily need to hold both the too-small and the re-sized buffer in memory).
Depending on the buffer sizes, your program might also suffer from a performance penalty when resizing the buffer - after all, the contents you already read needs to be copied over to the new, re-sized, buffer.
In C++, you would probably use a vector to do the same thing and may run into the very same problems.
One last method to load large files is memory mapping - But this also has the requirement that you need to know how much space you need.
I'm using a binary file to recover an object using boost::binary_iarchive_ia but it is too heavy (18GB) and that object loads the entire file to memory. Is there a way to read the file by parts (a lazy load) to avoid the memory use?
What I have:
std::ifstream ifs(filename);
boost::archive::binary_iarchive_ia(ifs);
MyObject obj;
ia >> obj;
Upgrading my comment to an answer:
#cmaster got really close to an approach that can workm but he accidentally put the problem upside down.
The raw file was never the issue (it was streaming all along).
The problem is that deserialization tries to put the data all in memory (the vector, e.g.). So the only real solutions would be to
is to put this data into a (shared?) memory map. You can use the allocators from Boost Interprocess to help you achieve this. This is a lot of effort, but relatively straight forward, conceptually.
one could modify the deserialization code to convert to a different on-disk format on the fly (instead of inserting into e.g. that vector), which would then allow mmap as cmaster suggested it.
In other words, you'd "canibalize" the boost serialization implementation to migrate the data away from boost serialization towards a raw binary format that affords using it directly in mapped memory.
You can use mmap() to map the file into your address space. With that, it doesn't matter that the file is too large because the kernel knows that any data in the mapped region is just a copy of the file on the hard disk. Consequently, it does not even need to swap the data out when it needs the memory for something else. The kernel will just lazily load the parts of the file that you need as you touch them, which is especially good if you don't need everything in the file.
The nice thing about mmap() is that you have the entire file contents accessible as a huge char array, which is quite convenient for many use cases. The only precondition that must be met is that your process runs as a 64 bit process, otherwise your virtual address space will be too small to fit the file into it.
The problem is following: I have certain amount of words (let's say 20M), each containing some bits used as flags; all stored in single continuous binary file.
What I would like to do is to get access to those words in container like style, so container_instance[i] allows me to access i-th word. To get things more complicated, I cannot store all words in memory at one time, they have to be stored back to file and memory freed for those not used for long period. To simplify things the whole sequence is partitioned to 1K fragments, so we need to free and allocate such 1K blocks. Memory should be freed after some time or after certain number of times container have been accessed.
Thread safety in nice to have. But I can protect externally.
The implementation I have currently only allocates blocks on demand (empty or read from file if they are available; file is not sparse, so everything after the last byte in file is allocated empty) and it is not nicely done. Not frees at all, so unused blocks remain in memory forever.
I started to think about nice looking solution and I would like to know whether any elements from STL or Boosts can help me build such container not by engraving it step by step from scratch?
I am not expecting full solutions, rather pointing "you can use that for that".
You can use mmap system call to map your file into memory. You can use pointer arithmetic with that buffer, so access by index is not a trouble.
Mapped pages are virutual and managed by the kernel, allowing to save unused memory blocks and load/flush them at transparently to you. Also, using madvise probably can enable some optimisations.
Suppose I have a file which has x records. One 'block' holds m records. Total number of blocks in file n=x/m. If I know the size of one record, say b bytes (size of one block = b*m), I can read the complete block at once using system command read() (is there any other method?). Now, how do I read each record from this block and put each record as a separate element into a vector.
The reason why I want to do this in the first place is to reduce the disk i/o operations. As the disk i/o operations are much more expensive according to what I have learned.
Or will it take the same amount of time as when I read record by record from file and directly put it into vectors instead of reading block by block? On reading block by block, I will have only n disk I/O's whereas x I/O's if I read record by record.
Thanks.
You should consider using mmap() instead of reading your files using read().
What's nice about mmap is that you can treat file contents as simply mapped into your process space as if you already had a pointer into the file contents. By simply inspecting memory contents and treating it as an array, or by copying data using memcpy() you will implicitly perform read operations, but only as necessary - operating system virtual memory subsystem is smart enough to do it very efficiently.
The only possible reason to avoid mmap maybe if you are running on 32-bit OS and file size exceeds 2 gigabytes (or slightly less than that). In this case OS may have trouble allocating address space to your mmap-ed memory. But on 64-bit OS using mmap should never be a problem.
Also, mmap can be cumbersome if you are writing a lot of data, and size of the data is not known upfront. Other than that, it is always better and faster to use it over the read.
Actually, most modern operating systems rely on mmap extensively. For example, in Linux, to execute some binary, your executable is simply mmap-ed and executed from memory as if it was copied there by read, without actually reading it.
Reading a block at a time won't necessarily reduce the number of I/O operations at all. The standard library already does buffering as it reads data from a file, so you do not (normally) expect to see an actual disk input operation every time you attempt to read from a stream (or anything close).
It's still possible reading a block at a time would reduce the number of I/O operations. If your block is larger than the buffer the stream uses by default, then you'd expect to see fewer I/O operations used to read the data. On the other hand, you can accomplish the same by simply adjusting the size of buffer used by the stream (which is probably a lot easier).
memcpy/memmove duplicate (copy the data) from source to destination. Does anything exist to move pages from one virtual address to another without doing an actual byte by byte copy of the source data? It seems to be perfectly possible to me, but does any operating system actually allow this? It seems odd to me that dynamic arrays are such a widespread and popular concept but that growing them by physically copying is such a wasteful operation. It just doesn't scale when you start talking about array sizes in the gigabytes (e.g. imagine growing a 100GB array into a 200GB array. That's a problem that's entirely possible on servers in the < $10K range now.
void* very_large_buffer = VirtualAlloc(NULL, 2GB, MEM_COMMIT);
// Populate very_large_buffer, run out of space.
// Allocate buffer twice as large, but don't actually allocate
// physical memory, just reserve the address space.
void* even_bigger_buffer = VirtualAlloc(NULL, 4GB, MEM_RESERVE);
// Remap the physical memory from very_large_buffer to even_bigger_buffer without copying
// (i.e. don't copy 2GB of data, just copy the mapping of virtual pages to physical pages)
// Does any OS provide support for an operation like this?
MoveMemory(very_large_buffer, even_bigger_buffer, 2GB)
// Now very_large_buffer no longer has any physical memory pages associated with it
VirtualFree(very_large_buffer)
To some extent, you can do that with mremap on Linux.
That call plays with the process's page table to do a zero-copy reallocation if it can. It is not possible in all cases (address space fragmentation, and simply the presence of other existing mappings are an issue).
The man page actually says this:
mremap() changes the mapping between virtual addresses and memory pages. This can be used to implement a very efficient realloc(3).
Yes it's a common use of memory mapped files to 'move' or copy memory between process by mapping different views of the file
Every POSIX system is able to do this. If you use mmap with a file descriptor (obtained by open or shm_open) and not anonymously you can unmap it, then truncate (shrink or grow) and then map it again. You may and often will get a different virtual address for the same pages.
I mean, you'd never be able to absolutely guarantee that there would be no active memory in that next 100GB, so you might not be able to make it contiguous.
On the other hand, you could use a ragged array (an array of arrays) where the arrays do not have to be next to each other (or even the same size). Many of the advantages of dynamic arrays may not scale to the 100GB realm.