How to map BYTE array as FILE * on Windows - c++

I found old, huge open source code which performs some computations on binary data stored in file on disk, output is also saved as binary file.
There is one root method which I would like to use, simplified signature:
int magic(FILE* input, FILE* output);
The problem is that I store input data in process memory and I would like to have output also as memory in process. Code is so big that I'm not able to rewrite it in resonable time.
This API forces me to make two huge I/O on every call to magic().
Is there any possibility to map BYTE array as FILE* on Windows using C/C++ mechanisms?

It seems that you need the functionality of fmemopen:
http://man7.org/linux/man-pages/man3/fmemopen.3.html
Which takes a memory region and returns a file descriptor.
Unfortunately, this is a POSIX function that does not have an equivalent in Windows. Memory mapped files are probably not what you want, as they take an existing file and map it to a memory region, not the other way around as with fmemopen. The only options you have are to either use fmemopen with mingw on Windows (don't know if you can do this) or roll out your own versions of fopen, fwrite and so on.

Related

mmap file opened with "fopen" in C++

After many frustrating experiences with limited support of HDF5 in many computers, I decided to write my own data container to store arrays in a binary file.
Basically, the format is very simple: each variable has a small header including a variable name, number of dimensions, actual size of each dimension and variable type. The data of one variable is stored right after the header. Variables are stored one after the other.
Read/write operations of header files are conveniently done using fseek, fread, fwrite and therefore I have opened the file using fopen, which returns a FILE*.
The problem is that if I want to update part of the values of one array on disk, the cleanest way to do it is using memory mapping (in my opinion). Looking at the documentation of mmap, it is possible to mmap files opened with "open", which return an int. But my file was already opened with "fopen".
Is it possible to mmap a section of a FILE*? How?
What you are looking for is the fileno function:
int fileno(FILE *stream);
The function fileno() examines the argument stream and returns its integer file descriptor.
You can call that on your file stream and pass the result to mmap. Either that or just use open instead of fopen to get a file descriptor in the first place since parsing the header via a memory map is probably easier than using fseek and fread.

How can I delete data from a sequential file while it is being appended by another process

I'm reading data from a file in a sequential manner while the same file is being appended data by another process. So far so good. My trouble comes when I need to delete from the file the data that I have already retrieved, which I have to do in order to prevent the file from getting too large due to the writing process. I don't need to delete exactly the data that I have just retrieved, but at least do some removal periodically without losing any data that have not already been read. How can I do this with C++?
I understand that there may be different valuable approaches. I'd check as valid answer any that would prove useful to my developing the code.
This is not just a matter of C++, any language you use it will at some point (in its runtime, standard library implementation, interpreter or whatever its architecture is) use the system calls that the system provides for file handling (e.g. open(), read(), write()).
I'm not aware of any system call that will delete parts of a file or replace parts with something else (you can position yourself somewhere in the middle of the file and start overwriting its contents, but this will be a byte for byte change, you can't change a piece of it with another piece with a different size). There are all sorts of workarounds for simulating deleting or changing parts of a file, but nothing that does it directly. For example: read from the original file, write only what you want to keep in a temporary file, remove the original and rename the temporary. But this will not work in your situation if the writing process keeps the file open.
Another approach would be something inspired by logrotate: when the file gets to a certain maximum size it gets switched with a new one, and you can process the previous one as you want. This approach does require changes in the writing process also.
You could specify the file length at the beginning, then start writing in it and when you reach your end of file, you just start writing at the beginning of the file again. But you should make sure that read pointer doesn't pass the writing pointer.
It seems like you're trying to emulate the behavior of a named pipe using a regular file. This would require special support from your operating system, which probably doesn't exist because you should be using named pipes instead. Named pipes are a special kind of file which is used for communication between two processes. Like regular files, it has a path, has a filename and exist on disk. However, where a regular file's contents are stored on disk, the contents of a named pipe only exists in memory and only that data that has been written, but not yet read. This is exactly what you're trying to do.
Assuming you're using a unix based OS. you can run mkfifo outputfile and then use outputfile for reading and appending. No C++ code required, though if you want you can also call mkfifo() from your c++ code.
If you're using Windows, it all becomes a bit more complicated. You have to specify something like \\.\pipe\outputfile as the filename for reading and appending.

Recover object lazy-loading the containing file

I'm using a binary file to recover an object using boost::binary_iarchive_ia but it is too heavy (18GB) and that object loads the entire file to memory. Is there a way to read the file by parts (a lazy load) to avoid the memory use?
What I have:
std::ifstream ifs(filename);
boost::archive::binary_iarchive_ia(ifs);
MyObject obj;
ia >> obj;
Upgrading my comment to an answer:
#cmaster got really close to an approach that can workm but he accidentally put the problem upside down.
The raw file was never the issue (it was streaming all along).
The problem is that deserialization tries to put the data all in memory (the vector, e.g.). So the only real solutions would be to
is to put this data into a (shared?) memory map. You can use the allocators from Boost Interprocess to help you achieve this. This is a lot of effort, but relatively straight forward, conceptually.
one could modify the deserialization code to convert to a different on-disk format on the fly (instead of inserting into e.g. that vector), which would then allow mmap as cmaster suggested it.
In other words, you'd "canibalize" the boost serialization implementation to migrate the data away from boost serialization towards a raw binary format that affords using it directly in mapped memory.
You can use mmap() to map the file into your address space. With that, it doesn't matter that the file is too large because the kernel knows that any data in the mapped region is just a copy of the file on the hard disk. Consequently, it does not even need to swap the data out when it needs the memory for something else. The kernel will just lazily load the parts of the file that you need as you touch them, which is especially good if you don't need everything in the file.
The nice thing about mmap() is that you have the entire file contents accessible as a huge char array, which is quite convenient for many use cases. The only precondition that must be met is that your process runs as a 64 bit process, otherwise your virtual address space will be too small to fit the file into it.

Create a handle from byte array c++ winapi

I have an application that takes a handle and performs some tasks. The handle is currently being created with CreateFile. My problem is that CreateFile takes a filepath as one of the arguments. I am looking for a way to return a handle from a byte array because the data in I need to process is not on disk. Does anyone know of any functions that take a byte array and return a handle or how I would go about doing this?
You have a few choices:
re-design your processing logic to read data from a memory pointer instead of a HANDLE, then you can pass your byte array as-is to your processing logic. In case you also need to process a file, you can read the file data into a byte array, then process it accordingly.
re-design your processing logic to read data from an IStream interface, then you can use SHCreateStreamOnFileEx() and SHCreateMemStream(), like Jonathan Potter suggested.
if you must read data from a HANDLE using ReadFile() or related function, you can either:
a. write your byte array to a temp file, then read back from that file.
b. create an anonymous pipe using CreatePipe(), then write the byte array to one end and read the data from the other end, like Harry Johnston suggested.
Using CreateFile() with the FILE_ATTRIBUTE_TEMPORARY attribute allows the operating system to keep the file in memory. You still have a copy happening as you have to write your memory buffer to the file, and then read that data back from that file, but if you have enough cache memory, nothing will hit the hard drive.
See for more details here:
CREATEFILE2_EXTENDED_PARAMETERS structure | Caching Behavior
It is not impossible that you could also use file mapping where the data written to the file is forced to stay in memory, but that's a lot more complicated for probably no gain as it is not unlikely going to be slower overall.

How do you pre-allocate space for a file in C/C++ on Windows?

I'm adding some functionality to an existing code base that uses pure C functions (fopen, fwrite, fclose) to write data out to a file. Unfortunately I can't change the actual mechanism of file i/o, but I have to pre-allocate space for the file to avoid fragmentation (which is killing our performance during reads). Is there a better way to do this than to actually write zeros or random data to the file? I know the ultimate size of the file when I'm opening it.
I know I can use fallocate on linux, but I don't know what the windows equivalent is.
Thanks!
Programatically, on Windows you have to use Win32 API functions to do this:
SetFilePointerEx() followed by SetEndOfFile()
You can use these functions to pre-allocate the clusters for the file and avoid fragmentation. This works much more efficiently than pre-writing data to the file. Do this prior to doing your fopen().
If you want to avoid the Win32 API altogether, you can also do it non-programatically using the system() function to issue the following command:
fsutil file createnew filename filesize
You can use the SetFileValidData function to extend the logical length of a file without having to write out all that data to disk. However, because it can allow to read disk data to which you may not otherwise have been privileged, it requires the SE_MANAGE_VOLUME_NAME privilege to use. Carefully read the Remarks section of the documentation.
I'd recommend instead just writing out the 0's. You can also use SetFilePointerEx and SetEndOfFile to extend the file, but doing so still requires writing out zeros to disk (unless the file is sparse, but that defeats the point of reserving disk space). See Why does my single-byte write take forever? for more info on that.
Sample code, note that it isn't necessarily faster especially with smart filesystems like NTFS.
if ( INVALID_HANDLE_VALUE != (handle=CreateFile(fileName,GENERIC_WRITE,0,0,CREATE_ALWAYS,FILE_FLAG_SEQUENTIAL_SCAN,NULL) )) {
// preallocate 2Gb disk file
LARGE_INTEGER size;
size.QuadPart=2048 * 0x10000;
::SetFilePointerEx(handle,size,0,FILE_BEGIN);
::SetEndOfFile(handle);
::SetFilePointer(handle,0,0,FILE_BEGIN);
}
You could use the _chsize() function.
Check out this example on Code Project. It looks pretty straightforward to set the file size when the file is initially crated.
http://www.codeproject.com/Questions/172979/How-to-create-a-fixed-size-file.aspx
FILE *fp = fopen("C:\\myimage.jpg","ab");
fseek(fp,0,SEEK_END);
long size = ftell(fp);
char *buffer = (char*)calloc(500*1024-size,1);
fwrite(buffer,500*1024-size,1,fp);
fclose(fp);
这篇文章可能对你有帮助。
The following article from Raymond may help.
How can I preallocate disk space for a file without it being reported as readable?
Use the Set­File­Information­By­Handle function, passing function code
File­Allocation­Info and a FILE_ALLOCATION_INFO structure. “Note that
this will decrease fragmentation, but because each write is still
updating the file size there will still be synchronization and metadata
overhead caused at each append.”
The effect of setting the file allocation info lasts only as long as
you keep the file handle open. When you close the file handle, all the
preallocated space that you didn’t use will be freed.