Fast resize of a mmap file

Fast resize of a mmap file - c++

I need a copy-free re-size of a very large mmap file while still allowing concurrent access to reader threads.
The simple way is to use two MAP_SHARED mappings (grow the file, then create a second mapping that includes the grown region) in the same process over the same file and then unmap the old mapping once all readers that could access it are finished. However, I am curious if the scheme below could work, and if so, is there any advantage to it.
mmap a file with MAP_PRIVATE
do read-only access to this memory in multiple threads
either acquire a mutex for the file, write to the memory (assume this is done in a way that the readers, which may be reading that memory, are not messed up by it)
or acquire the mutex, but increase the size of the file and use mremap to move it to a new address (resize the mapping without copying or unnecessary file IO.)
The crazy part comes in at (4). If you move the memory the old addresses become invalid, and the readers, which are still reading it, may suddenly have an access violation. What if we modify the readers to trap this access violation and then restart the operation (i.e. don't re-read the bad address, re-calculate the address given the offset and the new base address from mremap.) Yes I know that's evil, but to my mind the readers can only successfully read the data at the old address or fail with an access violation and retry. If sufficient care is taken, that should be safe. Since re-sizing would not happen often, the readers would eventually succeed and not get stuck in a retry loop.
A problem could occur if that old address space is re-used while a reader still has a pointer to it. Then there will be no access violation, but the data will be incorrect and the program enters the unicorn and candy filled land of undefined behavior (wherein there is usually neither unicorns nor candy.)
But if you controlled allocations completely and could make certain that any allocations that happen during this period do not ever re-use that old address space, then this shouldn't be a problem and the behavior shouldn't be undefined.
Am I right? Could this work? Is there any advantage to this over using two MAP_SHARED mappings?

It is hard for me to imagine a case where you don't know the upper bound on how large the file can be. Assuming that's true, you could "reserve" the address space for the maximum size of the file by providing that size when the file is first mapped in with mmap(). Of course, any accesses beyond the actual size of the file will cause an access violation, but that's how you want it to work anyway -- you could argue that reserving the extra address space ensures the access violation rather than leaving that address range open to being used by other calls to things like mmap() or malloc().
Anyway, the point is with my solution, you never move the address range, you only change its size and now your locking is around the data structure that provides the current valid size to each thread.
My solution doesn't work if you have so many files that the maximum mapping for each file runs you out of address space, but this is the age of the 64-bit address space so hopefully your maximum mapping size is no problem.
(Just to make sure I wasn't forgetting something stupid, I did write a small program to convince myself creating the larger-than-file-size mapping gives an access violation when you try to access beyond the file size, and then works fine once you ftruncate() the file to be larger, all with the same address returned from the first mmap() call.)

Related

Invalidating a specific area of data cache without flushing its content

I'm currently working on a project using the Zynq-7000 SoC. We have a custom DMA IP in PL to provide faster transactions between peripherals and main memory. The peripherals are generally serial devices such as UART. The data received by the serial device is transferred immediately to the main memory by DMA.
What I try to do is to reach the data stored at a pre-determined location of the memory. Before reading the data, I invalidate the related cache lines using a function provided by xil_cache.h library as below.
Xil_DCacheInvalidateRange(INTPTR adr, u32 len);
The problem here is that this function flushes the related cache lines before invalidating them. Due to flushing, the stored data is overwritten. Hence, every time I fetch the corrupted bytes. The process has been explained in library documentation as below.
If the address to be invalidated is not cache-line aligned, the
following choices are available:
Invalidate the cache line when
required and do not bother much for the side effects. Though it sounds
good, it can result in hard-to-debug issues. The problem is, if some
other variables are allocated in the same cache line and had been
recently updated (in cache), the invalidation would result in loss of
data.
Flush the cache line first. This will ensure that if any
other variable presents in the same cache line and updated recently are
flushed out to memory. Then it can safely be invalidated. Again it
sounds good, but this can result in issues. For example, when the
invalidation happens in a typical ISR (after a DMA transfer has
updated the memory), then flushing the cache line means, losing data
that were updated recently before the ISR got invoked.
As you can guess that I cannot always allocate a memory region that has a cache-line aligned address. Therefore, I follow a different way to solve the problem so that I calculate the cache-line aligned address which is located in memory right before my buffer. Then I call the invalidation method with that address. Note that the Zynq's L2 Cache is an 8-way set-associative 512KB cache with a fixed 32-byte line size. This is why I mask the last 5 bits of the given memory address. (Check the section 3.4: L2 Cache in Zynq's documentation)
INTPTR invalidationStartAddress = INTPTR(uint32_t(dev2memBuffer) - (uint32_t(dev2memBuffer) & 0x1F));
Xil_DCacheInvalidateRange(invalidationStartAddress, BUFFER_LENGTH);
This way I can solve the problem but I'm not sure if I'm violating any of the resources that are placed before the resource allocated for DMA.(I would like to add that the referred resource is allocated at heap using the dynamic allocation operator new.) Is there a way to overcome this issue, or am I overthinking it? I believe that this problem could be solved better if there was a function to invalidate the related cache lines without flushing them.
EDIT: Invalidating resources that are not residing inside the allocated area violates the reliability of variables placed close to the referred resource. So, the first solution is not applicable. My second solution is to allocate a buffer that is 32-byte bigger than the required one and crop its unaligned part. But, this one also can cause the same problem as its last part*(parts = 32-byte blocks)* is not guaranteed to have 32 bytes. Hence, it might corrupt the resources placed next to it. The library documentation states that:
Whenever possible, the addresses must be cache-line aligned. Please
note that not just the start address, even the end address must be
cache-line aligned. If that is taken care of, this will always work.
SOLUTION: As I stated in the last edit, the only way to overcome the problem was to allocate a memory region with a Cache-Aligned address and length. I'm not able to determine the start address of the allocated area, hence I've decided to allocate a space that is two Cache-Blocks bigger than the requested one and crop the unaligned parts. The unalignment can occur at the first or the last block. In order not to violate the destruction of the resources, I saved the originally allocated address carefully and used the Cache-Aligned one in all of the operations.
I believe that there are better solutions to the problem and I keep the question open.

Your solution is correct. There is no way to flush a subset of a cache line.
Normally this behavior is transparent to programs but it becomes visible in multithreaded code and when sharing memory with hardware accelerators.

Can a process read/write at any address from it's virtual memory?

Processes in OS have their own virtual address spaces. Say, I allocate some dynamic memory using malloc() function call in a c program and subtract some positive value(say 1000) from the address returned by it. Now, I try to read what is written on that location which should be fine but what about writing to that location?
virtual address space also has some read only chunk of memory. How does it protect that?

TL;DR No, it's not allowed.
In your case, when you got a valid non-NULL pointer to a memory address returned by malloc(), only the requested size of memory is allocated to your process and you're allowed to use (read and / or write) into that much space only.
In general, any allocated memory (compile-time or run-time) has an associated size with it. Either overrunning or underruning the allocated memory area is considered invalid memory access, which invokes undefined behavior.
Even if, the memory is accessible and inside the process address space, there's nothing stopping the OS/ memory manager to return the pointer to that particular address, so, at best, either your previous write will be overwritten or you will be overwriting some other value. The worst case, as mentioned earlier, UB.

Say, I allocate some dynamic memory using malloc() function call in a c program and subtract some positive value(say 1000) from the address returned by it. Now, I try to read what is written on that location which should be fine but what about writing to that location?
What addresses you can read/write/execute from are based on a processes current memory map, which is set up by the operating system.
On my linux box, if I run pmap on my current shell, I see something like this:
evaitl#bb /proc/13151 $ pmap 13151
13151: bash
0000000000400000 976K r-x-- bash
00000000006f3000 4K r---- bash
00000000006f4000 36K rw--- bash
00000000006fd000 24K rw--- [ anon ]
0000000001f25000 1840K rw--- [ anon ]
00007ff7cce36000 44K r-x-- libnss_files-2.23.so
00007ff7cce41000 2044K ----- libnss_files-2.23.so
00007ff7cd040000 4K r---- libnss_files-2.23.so
00007ff7cd041000 4K rw--- libnss_files-2.23.so
00007ff7cd042000 24K rw--- [ anon ]
...
[many more lines here...]
Each line has a base address, a size, and the permissions. These are considered memory segments. The last line either says what is being mapped in. bash is my shell. anon means this is allocated memory, perhaps for bss, maybe heap from malloc, or it could be a stack.
Shared libraries are also mapped in, that is where the the libnns_files lines come from.
When you malloc some memory, it will come from an anonymous program segment. If there isn't enough space in the current anon segment being used for the heap, the OS will increase its size. The permissions in those segments will almost certainly be rw.
If you try to read/write outside of space you allocated, behavior is undefined. In this case that means that you may get lucky and nothing happens, or you may trip over an unmapped address and get a SIGSEGV signal.

Now, I try to read what is written on that location which should be fine
It is not fine. According to the C++ standard, reading uninitialized memory has undefined behaviour.
but what about writing to that location?
Not fine either. Reading or writing unallocated memory also has undefined behaviour.
Sure, the memory address that you ended up in might be allocated - it's possible. But even if it happens to be, the pointer arithmetic outside of bounds of the allocation is already UB.
virtual address space also has some read only chunk of memory. How does it protect that?
This one is out of scope of C++ (and C) since it does not define virtual memory at all. This may differ across operating systems, but at least one approach is that when the process requests memory from the OS, it sends flags that specify the desired protection type. See prot argument in the man page of mmap as an example. The OS in turn sets up the virtual page table accordingly.
Once the protection type is known, the OS can raise an appropriate signal if the protection has been violated, and possibly terminate the process. Just like it does when a process tries to access unmapped memory. The violations are typically detected by the memory management unit of the CPU.

Processes in OS have their own virtual address spaces. Say, I allocate
some dynamic memory using malloc() function call in a c program and
subtract some positive value(say 1000) from the address returned by
it. Now, I try to read what is written on that location which should
be fine but what about writing to that location?
No, it should not be fine, since only the memory region allocated by malloc() is guaranteed to be accessible. There is no guarantee that the virtual address space is contiguous, and thus the memory addresses before and after your region are accessible (i.e. mapped to virtual address space).
Of course, no one is stopping you from doing so, but the behaviour will be really undefined. If you access non-mapped memory address, it will generate a page fault exception, which is a hardware CPU exception. When it is handled by the operating system, it will send SIGSEGV signal or access violation exception to your application (depending ot the OS).
virtual address space also has some read only chunk of memory. How
does it
protect that?
First it's important to note that virtual memory mapping is realized partly by an external hardware component, called a memory management unit. It might be integrated in the CPU chip, or not. Additionally to being able to map various virtual memory addresses to physical ones, it supports also marking these addresses with different flags, one of which enables and disables writing protection.
When the CPU tries to write on virtual address, marked as read-only, thus write-protected, (for examble by MOV instruction), the MMU fires a page fault exception on the CPU.
Same goes for trying to access a non-present virtual memory pages.

In the C language, doing arithmetic on a pointer to produce another pointer that does not point into (or one-past-the-end) the same object or array of objects is undefined behavior: from 6.5.6 Additive Operators:
If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated
(for the purposes of this clause, a non-array object is treated as an array of length 1)
You could get unlucky and the compiler could produce still produce a pointer you're allowed to do things with and then doing things with them will do things — but precisely what those things are is anybody's guess and will be unreliable and often difficult to debug.
If you're lucky, the compiler produces a pointer into memory that "does not belong to you" and you get a segmentation fault to alert you to the problem as soon as you try to read or write through it.

How the system behaves when you read/write an unmapped memory address depends basically on your operating system implementation. Operating systems normally behave differently when you try to access an unmapped virtual address. What happens when you try one access to an unmapped (or mapped for not-memory ---for example to map a file in memory) the operating system is taking the control (by means of a trap) and what happens then is completely operating system dependant. Suppose you have mapped the video framebuffer somewhere in your virtual address... then, writing there makes the screen change. Suppose you have mapped a file, then reading/writing that memory means reading or writing a file. Suppose you (the process running) try to access a swapped zone (due to physical memory lack your process has been partially swapped) your process is stopped and work for bringing that memory from secondary storage is begun, and then the instruction will be restarted. For example, linux generates a SIGSEGV signal when you try to access memory not allocated. But you can install a signal handler to be called upon receiving this signal and then, trying to access unallocated memory means jumping into a piece of code in your own program to deal with that situation.
But think that trying to access memory that has not been correctly acquired, and more in a modern operating system, normally means that your program is behaving incorrectly, and normally it will crash, letting the system to take the control and it will be killed.
NOTE
malloc(3) is not a system call, but a library function that manages a variable size allocation segment on your RAM, so what happens if you try to access even the first address previous to the returned one or past the last allocated memory cell, means undefined behaviour. It does not mean you have accessed unallocated memory. Probably you will be reading a perfectly allocated piece of memory in your code or in your data (or the stack) without knowing. malloc(3) tends to ask the operating system for continous large amounts of memory to be managed for many malloc calls between costly asking the operating system for more memory. See sbrk(2) or memmap(2) system calls manpages for getting more on this.
For example, either linux or bsd unix allocate an entry in the virtual address space of each process at page 0 (for the NULL address) to make the null pointer invalid access, and if you try to read or write to this address (or all in that page) you'll get a signal (or your process killed) Try this:
int main()
{
char *p = 0; /* p is pointing to the null address */
p[0] = '\n'; /* a '\n' is being written to address 0x0000 */
p[1] = '\0'; /* a '\0' is being written to address 0x0001 */
}
This program should fail at runtime on all modern operating systems (try to compile it without optimization so the compiler doesn't eliminate the code in main, as it does effectively nothing) because you are trying to access an already allocated (for specific purposes) page of memory.
The program on my system (mac OS X, a derivative from BSD unix) just does the following:
$ a.out
Segmentation fault: 11
NOTE 2
Many modern operating systems (mostly unix derived) implement a type of memory access called COPY ON WRITE. This means that you can access that memory and modify it as you like, but the first time you access it for writing, a page fault is generated (normally, this is implemented as you receiving a read only page, letting the fault to happen and making the individual page copy to store your private modifications) This is very effective on fork(2), that normally are followed by an exec(2) syscall (only the pages modified by the program are actually copied before the process throws them all, saving a lot of computer power)
Another case is the stack growing example. Stack grows automatically as you enter/leave stack frames in your program, so the operating system has to deal with the page faults that happen when you PUSH something on the stack and that push crosses a virtual page and goes into the unknown. When this happens, the OS automatically allocates a page and converts that region (the page) into more valid memor (read-write normally).

Technically, a process has a logical address. However, that often gets conflated into a virtual address space.
The number of virtual addresses that can be mapped into that logical address space can be limited by:
Hardware
System resources (notably page file space)
System Parameters (e.g., limiting page table size)
Process quotas
Your logical address space consists of an array of pages that are mapped to physical page frames. Not every page needs to have such a mapping (or even is likely to).
The logical address space is usually divided into two (or more) areas: system (common to all processes) and user (created for each process).
Theoretically, there is nothing in the user space to being a process with, only the system address space exists.
If the system does not use up its entire range of logical addresses (which is normal), unused addresses cannot be accessed at all.
Now your program starts running. The O/S has mapped some pages into your logical address space. Very little of that address space it likely to be mapped. Your application can map more pages into the unmapped pages of logical address space.
Say, I allocate some dynamic memory using malloc() function call in a c program and subtract some positive value(say 1000) from the address returned by it. Now, I try to read what is written on that location which should be fine but what about writing to that location?
The processor uses a page table to map logical pages to physical page frames. If you do you say a number of things can happen:
There is no page table entry for the address => Access violation. Your system may not set up a page table that can span the entire logical address space.
There is a page table entry for the address but it is marked invalid => Access Violation.
You are attempting to access a page that is not accessible in your current processor mode (e.g., user mode access to a page that only allows kernel mode access) => Access Violation.
virtual address space also has some read only chunk of memory. How does it protect that?
You are attempting to access a page that in a manner not permitted to the page (e.g., write to readonly page, execute to a no execute page) => Access Violation The access allowed to a page is defined in the page table.
[Ignoring page faults]
If you make it though those tests, you can access the random memory address.

It does not. It's actually you duty as a programmer to handle this

Why is this loop destroying my memory?

I have this function in my MMF class
void Clear() {
int size = SizeB();
int iter = size / sysInfo.granB;
for (int i = 0; i < iter; i++) {
auto v = (char*)MapViewOfFile(hMmf, FILE_MAP_READ | (write ? FILE_MAP_WRITE : 0), 0, i * sysInfo.granB, sysInfo.granB);
std::memset(v, 0, sysInfo.granB);
UnmapViewOfFile(v);
}
}
So what it does is go through the whole file in smallest addressable chunks (64k in this case), maps the view, writes 0's, unmap, repeat. It works allright and is very quick but when I use it, there is some phantom memory usage going on.
According to windows task manager, the process itself is using just a few megabytes but the "physical memory usage" leaps up when I use it on larger files. For instance, using this on a 2GB file is enough to put my laptop in a coma for a few minutes, physical memory usage goes to 99%, everything in task manager is frantically reducing memory and everything freezes for a while.
The whole reason I'm trying to do this in 64k chunks is to keep memory usage down but the chunk size doesn't really matter in this case, any size chunks * n to cover the file does the same thing.
Couple of things I've tried:
flushing the view before unmapping - this makes things terribly slow, doing that 2gb file in any size chunks takes like 10 minutes minutes.
adding a hardcoded delay in the loop - it actually works really good, it still gets it done in seconds and the memory usage stays down but I just really don't like the concept of a hardcoded delay in any loop
writing 0's to just the end of the file - I don't actually need to clear the file but only to force it to be ready for usage. What I mean is - when I create a new file and just start with my random IO, I get ~1MB/s at best. If I open an existing file or force write 0's in the new file first, I get much better speeds. I'm not exactly sure why that is but a user in another thread suggested that writing something to the very end of the file after setting the file pointer would have the same effect as clearing but from testing, this is not true.
So currently I'm trying to solve this from the angle of clearing the file without destroying the computers memory. Does anybody know how to appropriately limit that loop?

So here's the thing. When you MapViewOfFile, it allocates the associated memory range but may may mark it as swapped out (eg, if it hasn't already been read into memory). If that's the case, you then get a page fault when you first access it (which will then cause the OS to read it in).
Then when you UnmapViewOfFile, the OS takes ownership of the associated memory range and writes the now-not-accessible-by-userspace data back to disk (assuming, of course, that you've written to it, which marks the page as "dirty", otherwise it's straight up deallocated). To quote the documentation (that I asked you to read in comments): modified pages are written "lazily" to disk; that is, modifications may be cached in memory and written to disk at a later time.
Unmapping the view of the file is not guaranteed to "un-commit" and write the data to disk. Moreover, even CloseHandle does not provide that guarantee either. It merely closes the handle to it. Because of caching mechanisms, the operating system is entirely allowed to write data back to disk on its own time if you do not call FlushViewOfFile. Even re-opening the same file may simply pull data back from the cache instead of from disk.
Ultimately the problem is
you memory map a file
you write to the memory map
writing to the memory map's address range causes the file's mapping to be read in from disk
you unmap the file
unmapping the file "lazily" writes the data back to disk
OS may reach memory stress, sees that there's some unwritten data that it can now write to disk, and forces that to happen to recover the physical memory for new allocations; by the way, because of the OS lazily flushing, your IO is no longer sequential and causes spindle disk latency to drastically increase
You see better performance when you're sleeping because you're giving the OS the opportunity to say "hey I'm not doing anything... let's go ahead and flush cache" which coerces disk IO to be roughly sequential.

How can detect error when boost memory mapped file allocated more disk space than is free on HDD

In my modelling code I use boost memory mapped files, to allocate large-ish arrays on disk.
It works well, but I couldn't find a way to detect situation in which I allocate array which is larger than free space on disk drivw. For example following code will execute happily (assuming that I have less than 8E9 bytes of free space on HDD):
boost::iostreams::mapped_file_params file_params;
file_params.path = path;
file_params.mode = std::ios::in | std::ios::out;
file_params.new_file_size = static_cast<size_t>(8E9); # About 10GB
file_params.length = static_cast<size_t>(8E9);
boost::iostreams::mapped_file result;
result.open(file_params);
I can even work on resuld.data() until I write to part of memory which is not allocted (becaue of missing space on HDD) and then I get a following error:
memory access violation at address: 0x7e9e2cd1e000: non-existent physical address
Is there any way to detect this error before I get cryptic memory access violation?
I actually tested this: if file is bigger than avilable free space on partition it code has memory access violation, if it is smaller the code works (I tested it by changing space free on the partition not by editing the code).
Possible solutions
If I std:fill file contents with zeroes I still get memory access violation, but this error is located near the allocation and easier to debug. I'd rather want some way to raise an exception.

You can use fallocate or posix_fallocate to actually reserve space for the file up front. This way you know you won't ever "over-commit". It has a performance drawback, of course upon initial creation.
For security reasons the OS is likely to zero out the blocks on fallocate.
fallocate lets you do unwritten extents, but it still zeros upon first access.
On windows, this can be combated using, SetFileValidData lets you bypass even that.
Note that Linux with O_DIRECT + fallocate(), still uses considerable CPU (as opposed to Windows' SetFileValidData), and although IO bandwidth is usually the bottleneck, this could still have noticeable performance effect if you are doing much CPU work at the same time.

Is there any way to detect this error before I get cryptic memory access violation?
When you just change the size of a file it will be sparse, that means the areas without data do not consume disk space. Space is allocated during writes - and can create out of diskspace errors.
A way to solve the problem would be to write (dummy) data to the file instead of just changing the size. That takes more time but you would get the out of diskspace during this first write cycle only, as the file has its final size afterwards.

Is there any chance to get access violation from memcpy, if src buffer is accessed(read/write) from other thread

I got an access violation from my application.
CallStack:
0da0ccfc 77c46fa3 ntdll!KiUserExceptionDispatcher+0xe
0da0d004 4dfeee3a msvcrt!memcpy+0x33
0da0d45c 4dfdbc4b MyLibrary!MyClass::MyFunc+0x8d [MyFile.cpp # 574]
[MyFile.cpp # 574 memcpy( m_pMyPointer, m_pSrcPointer, m_nDataSize);
Here Im sure the following things.. m_pMyPointer is valid and any other thread will not read or write to this memory. Size of m_pMyPointer is greater than m_nDataSize. m_pSrcPointer may accessible from other thread( read or write ) there is very little chance for the size of m_pSrcPointer as less than m_nDataSize.
My doubt is, is there any cahnce to get access violation from memcpy( m_pMyPointer, m_pSrcPointer, m_nDataSize), if any other thread tries to read/write to m_pSrcPointer. Since memcpy() reads m_pSrcPointer, and not write to it..

I would exclude that. Concurrent read access to a memory area is by definition thread-safe. When one thread writes to a location of memory which is read by another, you lose thread-safety in the sense that result is unpredictable but you still should not get an access violation (in most sane platforms, including x86).
Most likely, the size of the valid memory area pointed by either m_pMyPointer or m_pSrcPointer is smaller than m_nDataSize.
However, if you have doubts that the same piece of memory is read and written to by different threads at the same time, it means you are at the very least missing a locking scheme there.

If the concurrent threads change only the data in the buffer, you should not get any AV by copying from/to the buffer.
If the concurrent threads change the pointer to the buffer or the variable containing the size of the buffer (number of bytes or elements), you can get an AV easily from copying to/from the buffer using these pointer and size variables. Here you're entering the land of undefined behavior.

There is a small possibility. If the write to m_srcPtr isn't atomic, or the other thread is writing to one of the other members and you haven't told us about it (m_nDataSize sounds like a likely thing for instance).
If the write to m_srcPtr isn't atomic, depending on your architecture, you could temporarily have an invalid pointer between the 2 writes to the pointer. If m_nDataSize is updated at "the same time" as m_srcPtr, you have plenty of opportunity for bad things to happen.
Note this is highly architecture dependant.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js