Temp file that exists only in RAM? - c++

I'm trying to write an encrpytion using the OTP method. In keeping with the security theories I need the plain text documents to be stored only in memory and never ever written to a physical drive. The tmpnam command appears to be what I need, but from what I can see it saves the file on the disk and not the RAM.
Using C++ is there any (platform independent) method that allows a file to exist only in RAM? I would like to avoid using a RAM disk method if possible.
Thanks
Edit:
Thanks, its more just a learning thing for me, I'm new to encryption and just working through different methods, I don't actually plan on using many of them (esspecially OTP due to doubling the original file size because of the "pad").
If I'm totally honest, I'm a Linux user so ditching Windows wouldn't be too bad, I'm looking into using RAM disks for now as FUSE seems a bit overkill for a "learning" thing.

The simple answer is: no, there is no platform independent way. Even keeping the data only in memory, it will still risk being swapped out to disk by the virtual memory manager.
On Windows, you can use VirtualLock() to force the memory to stay in RAM. You can also use CryptProtectMemory() to prevent other processes from reading it.
On POSIX systems (e.g. BSD, Linux) you can use mlock() to lock memory in RAM.

Not really unless you count in-memory streams (like stringstream).
No especially and specifically for security purposes: any piece of data can be swapped to disk on virtual memory systems.
Generally, if you are concerned about security, you have to use platform-specific methods for controlling access: What good is keeping your data in RAM if everyone can read it?

You might want to look at TrueCrypt's source code. Getting code at the file system level might be your best bet.

OTP is an awful encryption method for arbitrary files, unless you have a massive amount of entropy that you can guarantee never repeats itself (that's why it's called "one-time"!)
If you want to create a file-like object that only exists in memory and you don't care about Windows, I'd look at writing a custom FUSE filesystem (http://fuse.sourceforge.net/); this way you guarantee what will and will not get written to disk, and your files are accessible by all programs.

Using one of std::stringstream or fmemopen will get you file-like access to blocks of memory. If (for security) you want to avoid it being swapped out, use mlock which is probably easiest to use with fmemopen's buffer than std::stringstream. Combining mlock with std::stringstream would probably need to be done via a custom allocator (used as a template parameter).

Related

How to flush memory-mapped files using Boost's `mapped_file_sink` class?

Using the Boost Libraries version 1.62.0 and the mapped_file_sink class from Boost.IOStreams.
I want to flush the written data to disk at will, but there is no mapped_file_sink::flush() member function.
My questions are:
How can I flush the written data when using mapped_file_sink?
If the above can't be done, why not, considering that msync() and FlushViewOfFile() are available for a portable implementation?
If you look at the mapped file support for proposed Boost.AFIO v2 at https://ned14.github.io/boost.afio/classboost_1_1afio_1_1v2__xxx_1_1map__handle.html, you'll notice a lack of ability to flush mapped file views as well.
The reason why is because it's redundant on modern unified page cache kernels when the mapped view is identical in every way to the page cached buffers for that file. msync() is therefore a no-op on such kernels because dirty pages are already queued for writing out to storage as and when the system decides it is appropriate. You can block your process until the system has finished writing out all the dirty pages for that file using good old fsync().
All the above does not apply where (a) your kernel is not a unified page cache design (QNX, NetBSD etc) or (b) your file resides on a networked file system. If you are in an (a) situation, best to simply avoid memory mapped i/o altogether, just do read() and write(), they are such a small percentage of OSs nowadays let them suffer with poor performance. For the (b) situation, you are highly inadvised to be using memory mapped i/o ever with networked file systems. There is an argument for read-only maps of immutable files only, otherwise just don't do it unless you know what you're doing. Fall back to read() and write(), it's safer and less likely to surprise.
Finally, you linked to a secure file deletion program. Those programs don't work reliably any more with recent file systems because of delayed extent allocation or copy on write allocation. In other words, when you rewrite a section of an existing file, it doesn't modify the original data on storage but actually allocates new storage and points the extents list for the file at the new linked list. This allows a consistent file system to be recovered after unexpected data loss easily. To securely delete data on recent file systems you usually need to use special OS APIs, though deleting all the files and then filling the free space with random data may securely delete most of the data in question most of the time. Note copy on write filing systems may not release freed extents back to the free space pool for new allocation for many days or weeks until the next time a garbage collection routine fires or a snapshot is deleted. In this situation, filling free space with randomness will not securely delete the files in question. If all this is a problem, use FAT32 as your filing system, it's very simple and rewriting data on it really does rewrite the same data on storage (though note that some storage media e.g. SSDs are highly likely to also not rewrite data, these also write modifications to new storage and garbage collect freed extents later).

Write Only Memory Mapping in boost?

Why doesn't boost interprocess support write only memory mapping?
Maybe I'm missing something but wouldn't a write only mapping be significantly faster than a read/write mapping as the OS doesn't have to read in the pages from the disk, just flush out pages from memory to the disk? Also it would have the benefit of being entirely non blocking (except for flushing and destruction).
Would I benefit by switching from boost to native OS memory mapping?
In fact if you allocate a new memory-mapped file of size, say, 20Gb, you'll get a sparse file allocation of that size.
When "mapping in" pages of that files, there need to be a read operation (as the OS might be able to tell that the page is not physically present yet on disk), and only when (if) those pages are dirtied need they be written out.
Of course, this is implementation dependent and I don't think POSIX (can) guarantee this, but it's not unreasonable behaviour IYAM, and would be the equivalent of write-only mapping.
Actually, a write-only memmap would not be faster, as the OS can only keep track of changes / provide those mappings in whole-page granularity.
At least, if you want to avoid the prohibitive cost of simulating all access to such pages in kernel-land (not implemented) instead of just mapping a page.
Somehow, I doubt going directly to the OS API instead of going through the Boost-API could provide any significant speed-ups:
The boost API is a thin wrapper over the OS-specific interface and will be completely inlined and thus compiled out by any decent compiler.

Is memory-mapped memory possible?

I know that is possible to use memory-mapped files i.e. real files on disk that are transparently mapped to memory. As far as I understand (I haven't used these yet) the mapping takes place immediately, the file is partly read on the first memory access while the OS starts "caching" the whole file in the background.
Now: Is it possible to somewhat abuse this concept and memory-map another block of memory? Assuming the OS provides such indirection one could create a kind of compressed_malloc() that returns a mapping from memory to memory. The memory returned to the caller is simple the memory-mapped range that is transparently compressed in memory and also eventually kept in memory. Thus, for large buffers it could be possible that only part of it get decompressed on-the-fly (on access) while the remaining blocks are kept compressed.
Is that concept technically possible at the moment or - if already realized (in software) - what are the things to look at?
Update 1: I am more or less looking for something that is technically achievable without modifying the OS kernel itself or which requires a virtualization platform.
Update 2: I am hoping for something which allows me to implement the compression and related logic in my own user-space code. I would just use the facilities of the operating system to create the memory-mapping.
Very much so. The VM (Virtual Memory) system is designed to handle different kinds of objects that can be mapped. There is in fact a filesystem call cramfs that does something similar in the sense that it keeps compressed data in storage, but enables transparent, uncompressed access.
You would not be modifying the kernel per se, but you will have to work in the kernel space, implementing VM handlers for this new kind of a memory mapped object.
This is possible, eg.
http://pubs.vmware.com/vsphere-4-esx-vcenter/index.jsp?topic=/com.vmware.vsphere.resourcemanagement.doc_41/managing_memory_resources/c_memory_compression.html
It is not correctly implemented in kernel space in Linux, but something like this could be implemented in user space.

Store huge amount of data in memory

I am looking for a way to store several gb's of data in memory. The data is loaded into a tree structure. I want to be able to access this data through my main function, but I'm not interested in reloading the data into the tree every time I run the program. What is the best way to do this? Should I create a separate program for loading the data and then call it from the main function, or are there better alternatives?
thanks
Mads
I'd say the best alternative would be using a database - which would be then your "separate program for loading the data".
If you are using a POSIX compliant system, then take a look into mmap.
I think Windows has another function to memory map a file.
You could probably solve this using shared memory, to have one process that it long-lived build the tree and expose the address for it, and then other processes that start up can get hold of that same memory for querying. Note that you will need to make sure the tree is up to being read by multiple simultaneous processes, in that case. If the reads are really just pure reads, then that should be easy enough.
You should look into a technique called a Memory mapped file.
I think the best solution is to configure a cache server and put data there.
Look into Ehcache:
Ehcache is an open source, standards-based cache used to boost
performance, offload the database and simplify scalability. Ehcache is
robust, proven and full-featured and this has made it the most
widely-used Java-based cache.
It's written in Java, but should support any language you choose:
The Cache Server has two apis: RESTful resource oriented, and SOAP.
Both support clients in any programming language.
You must be running a 64 bit system to use more than 4 GB's of memory. If you build the tree and set it as a global variable, you can access the tree and data from any function in the program. I suggest you perhaps try an alternative method that requires less memory consumption. If you post what type of program, and what type of tree you're doing, I can perhaps give you some help in finding an alternative method.
Since you don't want to keep reloading the data...file storage and databases are out of question, but several gigs of memory seem like such a hefty price.
Also note that on Windows systems, you can access the memory of another program using ReadProcessMemory(), all you need is a pointer to use for the location of the memory.
You may alternatively implement the data loader as an executable program and the main program as a dll loaded and unloaded on demand. That way you can keep the data in the memory and be able to modify the processing code w/o reloading all the data or doing cross-process memory sharing.
Also, if you can operate on the raw data from the disk w/o making any preprocessing of it (e.g. putting it in a tree, manipulating pointers to its internals), you may want to memory-map the data and avoid loading unused portions of it.

Reserving Shared Memory with No File Backing (Linux/Windows) (boost::interprocess)

How can I reserve and allocate shared memory without the backing of a file? I'm trying to reserve a large (many tens of GiBs) chunk of shared memory and use it in multiple processes as a form of IPC. However, most of this chunk won't be touched at all (the access will be really sparse; maybe a few hundred megabytes throughout the life of the processes) and I don't care about the data when the applications end.
So preferably, the method to do this should have the following properties:
Doesn't commit the whole range. I will choose which parts to commit (actually use.) (But the pattern is quite unpredictable.)
Doesn't need a memory-mapped file or anything like that. I don't need to preserve the data.
Lets me access the memory area from multiple processes (I'll handle the locking explicitly.)
Works in both Linux and Windows (obviously a 64-bit OS is needed.)
Actually uses shared memory. I need the performance.
(NEW) The OS or the library doesn't try to initialize the reserved region (to zero or whatever.) This is obviously impractical and unnecessary.
I've been experimenting with boost::interprocess::shared_memory_object, but that causes a large file to be created on the filesystem (with the same size as my mapped memory region.) It does remove the file afterwards, but that hardly helps.
Any help/advice/pointers/reference is appreciated.
P.S. I do know how to do this on Windows using the native API. And POSIX seems to have the same functionality (only with a cleaner interface!) I'm looking for a cross-platform way here.
UPDATE: I did a bit of digging, and it turns out that the support that I thought existed in Windows was only a special case of memory-mapped files, using the system page file as a backing. (I'd never noticed it before because I had used at most a few megabytes of shared memory in the past projects.)
Also, I have a new requirement now (the number 6 above.)
On Windows, all memory has to be backed by the disk one way or another.
The closest I think you can accomplish on windows would be to memory map a sparse file. I think this will work in your case for two reasons:
On Windows, only the shared memory that you actually touch will become resident. This meets the first requirement.
Because the file is sparse, only the parts that have been modified will actually be stored on disk. If you looked at the properties of the file, it would say something like, "Size: 500 MB, Size on disk: 32 KB".
I realize that this technically doesn't meet your 2nd requirement, but, unfortunately, I don't think that is possible on Windows. At least with this approach, only the regions you actually use will take up space. (And only when Windows decides to commit the memory to disk, which it may or may not do at its discretion.)
As for turning this into a cross-platform solution, one option would be to modify boost::interprocess so that it creates sparse files on Windows. I believe boost::interprocess already meets your requirements on Linux, where POSIX shared memory is available.