I have recently started a project that requires the use of Shared/Named memory. I have a working prototype - but before I commit to the current implementation, I would like to learn a little bit more on the subject.
I have checked the MSDN docs (and various other sources) and I grasp the basic principles behind how everything works, but I was not able to find answers to my questions below.
1) If you create a shared memory space and don't provide a valid file handle, it uses the System Page file for its storage. My question is - If I create my own file, and map the view to that file - will the performance be comparatively the same as when mapping to the system page file?
2) You can access the data in the shared memory space by using CopyMemory (which creates a copy of the data) or by casting the result of MapViewOfFile to the type that was written there in the first place. lets assume we wrote a data structure "MyStruct" there. Is it save to do the following?
auto pReferenceToSharedMemory = (MyStruct*)MapViewOfFile(....);
pReferenceToSharedMemory->SomeField = 12345;
pReferenceToSharedMemory->SomeField2 = ...;
...
Assuming the above is safe to do - its surely more efficient to apply data changes to the data stored in the shared memory space than to copy the data out, change some values, and copy it back?
3) Lastly - how expensive are the OpenFileMapping and MapViewOfFile operations? I would think that ideally you should only execute OpenFileMapping once (at the beginning of your operation), execute MapViewOfFile once, and use the reference that it returns throughout the operation, rather than executing MapViewOfFile every time you would like to access the data?
Finally: Is it possible that the reference returned by MapViewOfFile and the data stored in MapViewOfFile to become out of sync ?
1) The choice between your own file and the system page file isn't performance; it's persistence. Changes to your file will still be there next time your program runs.
2) The proper solution would be `new (MapViewOfFile(...)) MyStruct, if the mapping is backed by the page file and therefore still empty.
3) The expensive operations are reading and writing, not the meta-operations.
4) I've got no idea what that even means, so I'm fairly certain the answer is NO.
Related
I'm reading data from a file in a sequential manner while the same file is being appended data by another process. So far so good. My trouble comes when I need to delete from the file the data that I have already retrieved, which I have to do in order to prevent the file from getting too large due to the writing process. I don't need to delete exactly the data that I have just retrieved, but at least do some removal periodically without losing any data that have not already been read. How can I do this with C++?
I understand that there may be different valuable approaches. I'd check as valid answer any that would prove useful to my developing the code.
This is not just a matter of C++, any language you use it will at some point (in its runtime, standard library implementation, interpreter or whatever its architecture is) use the system calls that the system provides for file handling (e.g. open(), read(), write()).
I'm not aware of any system call that will delete parts of a file or replace parts with something else (you can position yourself somewhere in the middle of the file and start overwriting its contents, but this will be a byte for byte change, you can't change a piece of it with another piece with a different size). There are all sorts of workarounds for simulating deleting or changing parts of a file, but nothing that does it directly. For example: read from the original file, write only what you want to keep in a temporary file, remove the original and rename the temporary. But this will not work in your situation if the writing process keeps the file open.
Another approach would be something inspired by logrotate: when the file gets to a certain maximum size it gets switched with a new one, and you can process the previous one as you want. This approach does require changes in the writing process also.
You could specify the file length at the beginning, then start writing in it and when you reach your end of file, you just start writing at the beginning of the file again. But you should make sure that read pointer doesn't pass the writing pointer.
It seems like you're trying to emulate the behavior of a named pipe using a regular file. This would require special support from your operating system, which probably doesn't exist because you should be using named pipes instead. Named pipes are a special kind of file which is used for communication between two processes. Like regular files, it has a path, has a filename and exist on disk. However, where a regular file's contents are stored on disk, the contents of a named pipe only exists in memory and only that data that has been written, but not yet read. This is exactly what you're trying to do.
Assuming you're using a unix based OS. you can run mkfifo outputfile and then use outputfile for reading and appending. No C++ code required, though if you want you can also call mkfifo() from your c++ code.
If you're using Windows, it all becomes a bit more complicated. You have to specify something like \\.\pipe\outputfile as the filename for reading and appending.
I'm using a binary file to recover an object using boost::binary_iarchive_ia but it is too heavy (18GB) and that object loads the entire file to memory. Is there a way to read the file by parts (a lazy load) to avoid the memory use?
What I have:
std::ifstream ifs(filename);
boost::archive::binary_iarchive_ia(ifs);
MyObject obj;
ia >> obj;
Upgrading my comment to an answer:
#cmaster got really close to an approach that can workm but he accidentally put the problem upside down.
The raw file was never the issue (it was streaming all along).
The problem is that deserialization tries to put the data all in memory (the vector, e.g.). So the only real solutions would be to
is to put this data into a (shared?) memory map. You can use the allocators from Boost Interprocess to help you achieve this. This is a lot of effort, but relatively straight forward, conceptually.
one could modify the deserialization code to convert to a different on-disk format on the fly (instead of inserting into e.g. that vector), which would then allow mmap as cmaster suggested it.
In other words, you'd "canibalize" the boost serialization implementation to migrate the data away from boost serialization towards a raw binary format that affords using it directly in mapped memory.
You can use mmap() to map the file into your address space. With that, it doesn't matter that the file is too large because the kernel knows that any data in the mapped region is just a copy of the file on the hard disk. Consequently, it does not even need to swap the data out when it needs the memory for something else. The kernel will just lazily load the parts of the file that you need as you touch them, which is especially good if you don't need everything in the file.
The nice thing about mmap() is that you have the entire file contents accessible as a huge char array, which is quite convenient for many use cases. The only precondition that must be met is that your process runs as a 64 bit process, otherwise your virtual address space will be too small to fit the file into it.
I have an application that takes a handle and performs some tasks. The handle is currently being created with CreateFile. My problem is that CreateFile takes a filepath as one of the arguments. I am looking for a way to return a handle from a byte array because the data in I need to process is not on disk. Does anyone know of any functions that take a byte array and return a handle or how I would go about doing this?
You have a few choices:
re-design your processing logic to read data from a memory pointer instead of a HANDLE, then you can pass your byte array as-is to your processing logic. In case you also need to process a file, you can read the file data into a byte array, then process it accordingly.
re-design your processing logic to read data from an IStream interface, then you can use SHCreateStreamOnFileEx() and SHCreateMemStream(), like Jonathan Potter suggested.
if you must read data from a HANDLE using ReadFile() or related function, you can either:
a. write your byte array to a temp file, then read back from that file.
b. create an anonymous pipe using CreatePipe(), then write the byte array to one end and read the data from the other end, like Harry Johnston suggested.
Using CreateFile() with the FILE_ATTRIBUTE_TEMPORARY attribute allows the operating system to keep the file in memory. You still have a copy happening as you have to write your memory buffer to the file, and then read that data back from that file, but if you have enough cache memory, nothing will hit the hard drive.
See for more details here:
CREATEFILE2_EXTENDED_PARAMETERS structure | Caching Behavior
It is not impossible that you could also use file mapping where the data written to the file is forced to stay in memory, but that's a lot more complicated for probably no gain as it is not unlikely going to be slower overall.
I am a beginning C++ student. I have a structure array that holds employee info.
I can put values into the structure, write those values to a binary dat file and
read the values back into the program so that it can be displayed to the console.
Here is my problem. Once I close the program, I can't get the file to read the data from the file back into memory - instead it reads "garbage."
I tried a few things and then read this in my book:
NOTE: Structures containing pointers cannot be correctly stored to
disk using the techniques of this section. This is because if the
structure is read into memory on a subsequent run of the program, it
cannot be guaranteed that all program variables will be at the same
memory locations.
I am pretty sure this is what is going on when I try to open a .dat file with previously stored information and try to read it into a structure array.
I can send my code examples if that would help clarify my question.
Any suggestions would be appreciated.
Speaking generally (since I don't have your code) there's two reasons you generally shouldn't just write the bytes of a struct or class to a file:
As your book mentioned, writing a pointer to disk is pointless, since you're just storing a random address and not the data at that address. You need to write out the data itself.
You especially should not attempt to write a struct/class all at once with something like
fwrite(file, myStruct, sizeof(myStruct)). Compilers sometimes put empty bytes between variables in structs in order to let the processor read them faster - this is called padding. Using a different compiler or compiling for a different computer architecture can pad structures differently, so a file that opens correctly on one computer might not open correctly on another.
There's lots of ways you can write out data to a file, be it in a binary format or in some human-readable format like XML. Regardless of what method you choose to use (each has strengths and weaknesses), every one of them involves writing each piece of data you care to save one by one, then reading them back in one by one. Higher level languages like Java or C# have ways to do this automagically by storing metadata about objects, but this comes at the cost of more memory usage and slower program execution.
I have to deal with a huge amount of data that usually doesn't fit into main memory. The way I access this data has high locality, so caching parts of it in memory looks like a good option. Is it feasible to just malloc() a huge array, and let the operating system figure out which bits to page out and which bits to keep?
Assuming the data comes from a file, you're better off memory mapping that file. Otherwise, what you end up doing is allocating your array, and then copying the data from your file into the array -- and since your array is mapped to the page file, you're basically just copying the original file to the page file, and in the process polluting the "cache" (i.e., physical memory) so other data that's currently active has a much better chance of being evicted. Then, when you're done you (typically) write the data back from the array to the original file, which (in this case) means copying from the page file back to the original file.
Memory mapping the file instead just creates some address space and maps it directly to the original file instead. This avoids copying data from the original file to the page file (and back again when you're done) as well as temporarily moving data into physical memory on the way from the original file to the page file. The biggest win, of course, is when/if there are substantial pieces of the original file that you never really use at all (in which case they may never be read into physical memory at all, assuming the unused chunk is at least a page in size).
If the data are in a large file, look into using mmap to read it. Modern computers have so much RAM, you might not enough swap space available.