Access boost::mapped_region data directly? - c++

In the below code:
file_mapping fm(FilePath, read_only);
mapped_region region(fm,read_only);
char* const data = static_cast<char *>(region.get_address());
for(size_t n=0; n<region.get_size(); ++n){
cout << data[n];
}
is there any way to access characters from the mapped memory without needing to create the data array?
EDIT code refers to using namespace boost:interprocess;

The data "array" is not actually created as an expensive allocation or copy - it's merely a pointer to the virtual memory space the OS uses to represent the file contents in memory. So that's a bit of bookkeeping but no actual significant work.
When you first access it (i.e. data[0]), the OS pages in the first block of file using optimised routines more efficient than C++ streams or C's (f)read. Good OS'es would also preload the second & subsequent blocks and silently drop old used blocks, so managing physical memory efficiently whilst being faster than you'd expect. Just make sure your file fits in your free virtual memory space - usually only a problem for 1+ GB files in 32-bit code.
So no, there's no other way - wanted or known - of accessing the contents. (I'm discounting re-opening the file using standard I/O routines!)

Related

I don't understand about memory issue of appending string

Runtime error: pointer index expression with base 0x000000000000 overflowed to 0xffffffffffffffff for frequency sort
In first answer of that link, it says that appending char to string can cause memory issue.
string s = "";
char c = 'a';
int max = INT_MAX;
for(int j=0;j<max;j++)
s = s + c;
The answer explains [s=s+c in above code copies the same string again and again so it will cause memory issue.] But I don't understand why that code copies the same string again and again.
Is there someone who is likely to make me understand that part :)?
I don't understand why that code copies the same string again and
again.
Okay, let's look at the what happens each time the loop is iterated:
s = s + c;
There are three things the program has to do in order to execute that line of code:
Compute the temporary value s + c -- to do that, the program has to create a temporary, anonymous std::string object, and allocate for it (from the heap) an internal byte-buffer that is at least one byte larger than the number of chars currently in s (so that it can hold all of s's old contents, plus the additional char provided by c)
Set s equal to the temporary-string. In C++03 and earlier, this would be done by reallocating s's internal byte-buffer to be larger, then copying all of the bytes from the temporary-string into s's new/larger buffer. C++11 optimizes this a bit via the new move-assignment operator, so that all the bytes don't have to be copied; rather, s can simply take ownership of the temporary-string's byte-buffer.
Free the temporary string's resources, now that we're done using it. In practice, this takes the form of the std::string class's destructor calling delete[] on the old (no-longer-large-enough) byte-buffer.
Given that the above is going to be performed at least 2 billion times in a loop, it's already quite inefficient.
However, what I think the answer you referred to was particularly concerned about was heap fragmentation. Keep in mind that heap allocation doesn't work by magic; when you (or the std::string class, or anyone) asks to allocate N bytes of memory from the heap, the heap implementation's job is to find N bytes of contiguous memory and return it. And since there is no provision in C++ for moving blocks of memory around (as doing so would invalidate any pointers that the program might have pointing into those blocks of memory), the heap can't create an N-byte contiguous-memory-chunk out of smaller chunks; instead, there has to be a range of contiguous-memory-space already available. For example, it does the heap no good to have a total of 1GB of memory available, if that 1GB of memory is made up of thousands of nonconsecutive 1KB chunks and the caller is asking for a 2KB allocation.
Therefore, the heap's job is to efficiently allocate chunks of memory of the sizes the program requests, and when they are freed again, it will try to glue them back together into larger chunks again if it can, but it may not always be able to. Certain patterns of allocating and freeing memory may result in heap fragmentation, which is simply a large number of discontinuous memory allocations that render the small regions of free memory between them unusable for large allocations.
Whether or not this particular allocate/free pattern would cause that, I'm not sure; given that only one or two buffers are being allocated at a time, the heap may be able to reabsorb them back into adjacent free-memory chunks as they get freed again -- it probably depends on the particular heap algorithm the system is using, as well as on whether any other threads are allocating/freeing heap memory while this is going on. But I wouldn't be too surprised if there are systems out there where it would cause problems (particularly on 16-bit or 32-bit systems where virtual address space is limited, or embedded systems that don't use virtual memory)

C++ allocates more bytes than asked?

int main(int argc, const char* argv[])
{
for(unsigned int i = 0; i < 10000000; i++)
char *c = new char;
cin.get();
}
In the above code, why does my program use 471MB memory instead of 10MB as one would expect?
Allocation of RAM comes from a combined effort of the runtime library and the operating system. In order to identify the one byte your example code requests, there is some structure which identifies this memory to the runtime. It is sometimes a double linked list, but it's defined by the operating system and runtime implementation.
You can analogize it this way: If you have a linked list container, what you're interested in is simply what you've placed inside each link, but the container must have pointers to the other links in the containers in order to maintain the linked list.
If you use a debugger, or some other debugging tool to track memory, these structures can be even larger, making each allocation more costly.
RAM isn't typically allocated out of an array, but it is possible to overload the new operator to change allocation behavior. It could be possible specifically allocate from an array (a large one in your example) so that allocations behaved as you seem to have expected, and in some applications this is a specific strategy to control memory and improve performance (though the details are usually more complex than that simple illustration).
The allocation not only contains the allocated memory itself, but at least one word telling delete how much memory it has to release; moreover that is a number that has to be correctly aligned, so there will be a certain padding after the allocated char to ensure that the next block is correctly aligned. On a 64 bit machine, that means at least 16 bytes per allocation (8 bytes to hold the size, 1 byte to hold the character, and 7 bytes padding to ensure correct alignment).
However most probably that's not the only data stored; to help the memory allocator to find free memory, additional data is likely stored; if one assumes that data to consist of three pointers, one gets to a total 40 bytes per allocation, which matches your data quite well.
Note also that the allocator will also request a bit more memory from the operating system than needed for the actual allocation, so that it won't need to do an expensive OS call for every little allocation. That is, the run time library allocates larger chunks of memory from the operating system, and then cuts those in smaller pieces for your program's allocations. Thus generally there will be some memory allocated from the operating system (and thus showing up in the task manager), but not yet allocated to a certain object in your program.

How to read blocks of data from a file and then read from that block into a vector?

Suppose I have a file which has x records. One 'block' holds m records. Total number of blocks in file n=x/m. If I know the size of one record, say b bytes (size of one block = b*m), I can read the complete block at once using system command read() (is there any other method?). Now, how do I read each record from this block and put each record as a separate element into a vector.
The reason why I want to do this in the first place is to reduce the disk i/o operations. As the disk i/o operations are much more expensive according to what I have learned.
Or will it take the same amount of time as when I read record by record from file and directly put it into vectors instead of reading block by block? On reading block by block, I will have only n disk I/O's whereas x I/O's if I read record by record.
Thanks.
You should consider using mmap() instead of reading your files using read().
What's nice about mmap is that you can treat file contents as simply mapped into your process space as if you already had a pointer into the file contents. By simply inspecting memory contents and treating it as an array, or by copying data using memcpy() you will implicitly perform read operations, but only as necessary - operating system virtual memory subsystem is smart enough to do it very efficiently.
The only possible reason to avoid mmap maybe if you are running on 32-bit OS and file size exceeds 2 gigabytes (or slightly less than that). In this case OS may have trouble allocating address space to your mmap-ed memory. But on 64-bit OS using mmap should never be a problem.
Also, mmap can be cumbersome if you are writing a lot of data, and size of the data is not known upfront. Other than that, it is always better and faster to use it over the read.
Actually, most modern operating systems rely on mmap extensively. For example, in Linux, to execute some binary, your executable is simply mmap-ed and executed from memory as if it was copied there by read, without actually reading it.
Reading a block at a time won't necessarily reduce the number of I/O operations at all. The standard library already does buffering as it reads data from a file, so you do not (normally) expect to see an actual disk input operation every time you attempt to read from a stream (or anything close).
It's still possible reading a block at a time would reduce the number of I/O operations. If your block is larger than the buffer the stream uses by default, then you'd expect to see fewer I/O operations used to read the data. On the other hand, you can accomplish the same by simply adjusting the size of buffer used by the stream (which is probably a lot easier).

Does any OS allow moving memory from one address to another without physically copying it?

memcpy/memmove duplicate (copy the data) from source to destination. Does anything exist to move pages from one virtual address to another without doing an actual byte by byte copy of the source data? It seems to be perfectly possible to me, but does any operating system actually allow this? It seems odd to me that dynamic arrays are such a widespread and popular concept but that growing them by physically copying is such a wasteful operation. It just doesn't scale when you start talking about array sizes in the gigabytes (e.g. imagine growing a 100GB array into a 200GB array. That's a problem that's entirely possible on servers in the < $10K range now.
void* very_large_buffer = VirtualAlloc(NULL, 2GB, MEM_COMMIT);
// Populate very_large_buffer, run out of space.
// Allocate buffer twice as large, but don't actually allocate
// physical memory, just reserve the address space.
void* even_bigger_buffer = VirtualAlloc(NULL, 4GB, MEM_RESERVE);
// Remap the physical memory from very_large_buffer to even_bigger_buffer without copying
// (i.e. don't copy 2GB of data, just copy the mapping of virtual pages to physical pages)
// Does any OS provide support for an operation like this?
MoveMemory(very_large_buffer, even_bigger_buffer, 2GB)
// Now very_large_buffer no longer has any physical memory pages associated with it
VirtualFree(very_large_buffer)
To some extent, you can do that with mremap on Linux.
That call plays with the process's page table to do a zero-copy reallocation if it can. It is not possible in all cases (address space fragmentation, and simply the presence of other existing mappings are an issue).
The man page actually says this:
mremap() changes the mapping between virtual addresses and memory pages. This can be used to implement a very efficient realloc(3).
Yes it's a common use of memory mapped files to 'move' or copy memory between process by mapping different views of the file
Every POSIX system is able to do this. If you use mmap with a file descriptor (obtained by open or shm_open) and not anonymously you can unmap it, then truncate (shrink or grow) and then map it again. You may and often will get a different virtual address for the same pages.
I mean, you'd never be able to absolutely guarantee that there would be no active memory in that next 100GB, so you might not be able to make it contiguous.
On the other hand, you could use a ragged array (an array of arrays) where the arrays do not have to be next to each other (or even the same size). Many of the advantages of dynamic arrays may not scale to the 100GB realm.

Memory allocators

I want to make a virtual allocator using c++ on windows,, which allocate data on a file on the hard disk, to reduce physical memory usage when allocate large objects !..
I don't want to use system virtual memory with virtualAlloc.. . I want to create a file on the disk and use it to allocate the whole objects and then move the part or the object that I need to the RAM .
I tried to use Memory mapped file , but I faced some problems: I used the mapped file to allocate vector elements, but when I bake to delete any of them, the address of the element changed, also I can't find a method to map the object only when needed "in my test I mapped the whole file"!
Any resources or open source projects can help ???
Google can help here. I implemented a custom STL allocator a number of years ago that used a shared memory store. The same techniques can be used to implement a disk-backed allocator. I would start by looking at this SourceForge project for inspiration.
You may find inspiration from Boost.Interprocess, which provides support for memory mapped files, as well as allocators and containers over that memory.
More information about the allocator design can also be found at http://www.boost.org/doc/libs/1_37_0/doc/html/interprocess/architecture.html
Sorry, but you fail to understand how (virtual) memory works. One the one hand you state that "I want to make "custom memory allocator" but without take a large space from the memory" but on the other hand you're surprised that "the address of the element changed".
This is pretty much to be expected. To make sure that the address of a (logical) object doesn't change, you have to keep the memory represented by that address committed to the object. If you free the memory, it becomes available for reuse, and so does the address. And if the address is reused, you can't page back the object to that address.
Ultimately, the problem here it that addresses and memory are very, very deeply connected. Recycling memory means recycling addresses.
From http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=213
The POSIX header includes memory mapping syscalls and data structures. Because this interface is more intuitive and simpler than that of Windows, I base my memory mapping example on the POSIX library.
The mmap() system call:
caddr_t mmap(caddress_t map_addr,
size_t length,
int protection,
int flags,
int fd,
off_t offset);
Let's examine what each parameter means.
In the following example, the program maps the first 4 KB of a file passed in command line into its memory and then reads int value from it:
#include <errno.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/types.h>
int main(int argc, char *argv[])
{
int fd;
void * pregion;
if (fd= open(argv[1], O_RDONLY) <0)
{
perror("failed on open");
return –1;
}
/*map first 4 kilobytes of fd*/
pregion=mmap(NULL, 4096, PROT_READ,MAP_SHARED,fd,0);
if (pregion==(caddr_t)-1)
{
perror("mmap failed")
return –1;
}
close(fd); //close the physical file because we don't need it
//access mapped memory; read the first int in the mapped file
int val= *((int*) pregion);
}
To unmap a mapped region, use the munmap() function:
int munmap(caddr_t addr, int length);
addr is the address of the region being unmapped. length specifies how much of the memory should be unmapped (you may unmap a portion of a previously-mapped region). The following example unmaps the first kilobyte of the previously-mapped file. The remaining three KB still remain mapped to the process's RAM after this call:
munmap(pregion, 1024);
Probably the best way to solve this is not to return regular pointers to large objects. Simply return small proxies. These proxy objects implement the full interface of the larger object. However, these proxy objects can deal with the raw data being either in RAM or on disk. The proxies implement a LRU mechanism amongst themselves to optimize RAM use. The caller never sees the address of these proxies change, nor does it get any pointers to raw data.