After many frustrating experiences with limited support of HDF5 in many computers, I decided to write my own data container to store arrays in a binary file.
Basically, the format is very simple: each variable has a small header including a variable name, number of dimensions, actual size of each dimension and variable type. The data of one variable is stored right after the header. Variables are stored one after the other.
Read/write operations of header files are conveniently done using fseek, fread, fwrite and therefore I have opened the file using fopen, which returns a FILE*.
The problem is that if I want to update part of the values of one array on disk, the cleanest way to do it is using memory mapping (in my opinion). Looking at the documentation of mmap, it is possible to mmap files opened with "open", which return an int. But my file was already opened with "fopen".
Is it possible to mmap a section of a FILE*? How?
What you are looking for is the fileno function:
int fileno(FILE *stream);
The function fileno() examines the argument stream and returns its integer file descriptor.
You can call that on your file stream and pass the result to mmap. Either that or just use open instead of fopen to get a file descriptor in the first place since parsing the header via a memory map is probably easier than using fseek and fread.
Related
I found old, huge open source code which performs some computations on binary data stored in file on disk, output is also saved as binary file.
There is one root method which I would like to use, simplified signature:
int magic(FILE* input, FILE* output);
The problem is that I store input data in process memory and I would like to have output also as memory in process. Code is so big that I'm not able to rewrite it in resonable time.
This API forces me to make two huge I/O on every call to magic().
Is there any possibility to map BYTE array as FILE* on Windows using C/C++ mechanisms?
It seems that you need the functionality of fmemopen:
http://man7.org/linux/man-pages/man3/fmemopen.3.html
Which takes a memory region and returns a file descriptor.
Unfortunately, this is a POSIX function that does not have an equivalent in Windows. Memory mapped files are probably not what you want, as they take an existing file and map it to a memory region, not the other way around as with fmemopen. The only options you have are to either use fmemopen with mingw on Windows (don't know if you can do this) or roll out your own versions of fopen, fwrite and so on.
Is there any "fast" way to edit the first line of a big file(~100Mg) in C++?
I know we can read the file line by line, make changes, write it to a temporary file, and rename the temporary file. But, I am wondering if there is a faster way of doing this (something like in-place modification)?
You can probably use the fwrite/fprintf file manipulation methods to be able to write to the file depending on the file pointer's position.
You open the file with fopen for appending, use fseek to the beginning and write what you need. However, you should be careful with the length of the first line. If you write less than the original line you will still have that extra content left over. If you write more you will overwrite your other content.
100MB is not that big on modern computers. If this is a one time deal and you're not working on a really slow device, you can simply read the whole file, split it into lines, make your edit and write it all back in a moment.
If this is something that's going to happen more often, you could benefit from simply adding some whitespace padding to the first line (if possible) to create a "buffer" for things that you can put there the next time. Then you can use fwrite to overwrite just that first line, without touching the rest of the file.
There may be OS and filesystem specific ways to allocate additional space inside an existing file without moving the data. For example on Linux with XFS/ext4 you can use fallocate:
int fallocate(int fd, int mode, off_t offset, off_t len);
fallocate() allows the caller to directly manipulate the allocated disk space for the file referred to by fd for the byte range starting at offset and continuing for len bytes.
I believe the fastest way to accomplish your task is to create a new file that contains the first line value. Whenever you take a request to read the file, you read the first line value file first, then read the larger file, skipping over the first line that is actually stored with the larger file. Whenever you want to change the first line, just change the first line file.
You're thinking of a memory-mapped file, in which the entire file is "mapped" into memory but not actually loaded or rewritten until you attempt to access or modify a part of it. On POSIX systems, you can mmap() a part of a file (say, the first kilobyte), modify it as necessary, then use msync() to write just that chunk of memory back to the disk.
I have an application that takes a handle and performs some tasks. The handle is currently being created with CreateFile. My problem is that CreateFile takes a filepath as one of the arguments. I am looking for a way to return a handle from a byte array because the data in I need to process is not on disk. Does anyone know of any functions that take a byte array and return a handle or how I would go about doing this?
You have a few choices:
re-design your processing logic to read data from a memory pointer instead of a HANDLE, then you can pass your byte array as-is to your processing logic. In case you also need to process a file, you can read the file data into a byte array, then process it accordingly.
re-design your processing logic to read data from an IStream interface, then you can use SHCreateStreamOnFileEx() and SHCreateMemStream(), like Jonathan Potter suggested.
if you must read data from a HANDLE using ReadFile() or related function, you can either:
a. write your byte array to a temp file, then read back from that file.
b. create an anonymous pipe using CreatePipe(), then write the byte array to one end and read the data from the other end, like Harry Johnston suggested.
Using CreateFile() with the FILE_ATTRIBUTE_TEMPORARY attribute allows the operating system to keep the file in memory. You still have a copy happening as you have to write your memory buffer to the file, and then read that data back from that file, but if you have enough cache memory, nothing will hit the hard drive.
See for more details here:
CREATEFILE2_EXTENDED_PARAMETERS structure | Caching Behavior
It is not impossible that you could also use file mapping where the data written to the file is forced to stay in memory, but that's a lot more complicated for probably no gain as it is not unlikely going to be slower overall.
I'm not finding a clear answer to one aspect of the fstream object necessary to determine whether it is worth using. Does fstream store its contents in memory, or is it more like a pointer to a location in a file? I was originally using CFile and reading the text into a CString, but I'd rather not have the entire file in memory if I can avoid it.
fstream is short for file stream -- it's normally a connection to a file in the host OS's file system. (§27.9.1.1/1: "The class basic_filebuf<charT,traits> associates both the input sequence and the output sequence with a file.")
It does (normally) buffer some information from that file, and if you happen to be working with a tiny file, it might all happen to fit in the buffer. In a typical case, however, most of the data will be in a file on disk (or at least in the OS's file cache) with some relatively small portion of it (typically a few kilobytes) in the fstream's buffer.
If you did want to use a buffer in memory and have it act like a file, you'd normally use a std::stringstream (or a variant like std::istringstream or std::ostringstream).
I'm adding some functionality to an existing code base that uses pure C functions (fopen, fwrite, fclose) to write data out to a file. Unfortunately I can't change the actual mechanism of file i/o, but I have to pre-allocate space for the file to avoid fragmentation (which is killing our performance during reads). Is there a better way to do this than to actually write zeros or random data to the file? I know the ultimate size of the file when I'm opening it.
I know I can use fallocate on linux, but I don't know what the windows equivalent is.
Thanks!
Programatically, on Windows you have to use Win32 API functions to do this:
SetFilePointerEx() followed by SetEndOfFile()
You can use these functions to pre-allocate the clusters for the file and avoid fragmentation. This works much more efficiently than pre-writing data to the file. Do this prior to doing your fopen().
If you want to avoid the Win32 API altogether, you can also do it non-programatically using the system() function to issue the following command:
fsutil file createnew filename filesize
You can use the SetFileValidData function to extend the logical length of a file without having to write out all that data to disk. However, because it can allow to read disk data to which you may not otherwise have been privileged, it requires the SE_MANAGE_VOLUME_NAME privilege to use. Carefully read the Remarks section of the documentation.
I'd recommend instead just writing out the 0's. You can also use SetFilePointerEx and SetEndOfFile to extend the file, but doing so still requires writing out zeros to disk (unless the file is sparse, but that defeats the point of reserving disk space). See Why does my single-byte write take forever? for more info on that.
Sample code, note that it isn't necessarily faster especially with smart filesystems like NTFS.
if ( INVALID_HANDLE_VALUE != (handle=CreateFile(fileName,GENERIC_WRITE,0,0,CREATE_ALWAYS,FILE_FLAG_SEQUENTIAL_SCAN,NULL) )) {
// preallocate 2Gb disk file
LARGE_INTEGER size;
size.QuadPart=2048 * 0x10000;
::SetFilePointerEx(handle,size,0,FILE_BEGIN);
::SetEndOfFile(handle);
::SetFilePointer(handle,0,0,FILE_BEGIN);
}
You could use the _chsize() function.
Check out this example on Code Project. It looks pretty straightforward to set the file size when the file is initially crated.
http://www.codeproject.com/Questions/172979/How-to-create-a-fixed-size-file.aspx
FILE *fp = fopen("C:\\myimage.jpg","ab");
fseek(fp,0,SEEK_END);
long size = ftell(fp);
char *buffer = (char*)calloc(500*1024-size,1);
fwrite(buffer,500*1024-size,1,fp);
fclose(fp);
这篇文章可能对你有帮助。
The following article from Raymond may help.
How can I preallocate disk space for a file without it being reported as readable?
Use the SetFileInformationByHandle function, passing function code
FileAllocationInfo and a FILE_ALLOCATION_INFO structure. “Note that
this will decrease fragmentation, but because each write is still
updating the file size there will still be synchronization and metadata
overhead caused at each append.”
The effect of setting the file allocation info lasts only as long as
you keep the file handle open. When you close the file handle, all the
preallocated space that you didn’t use will be freed.