Obtaining a FILE * handle without actually creating a file on disk - c++

I need to process some data with a legacy library that I can't modify. My problem is that it requires a plain old FILE handle in order to save its results, and I'm required not to write anything on disk at all.
I understood that there's no standard way to do that, but is it possible using windows API, boost or anything else, to obtain a file handle somewhat pointing to memory ?
I found nowhere a solution for which it's explicitly guarantee that no disc access is (systematically) performed.

I believe you can fopen a pipe, using the pipe syntax:
fopen("\\\\.\\pipe\\WritePipe", "w+");
You need to create the pipe using CreateNamedPipe, beforehand, but once you've done that you should be able to use the pipe for processing the data.
You'll probably have to create a thread to read from the pipe to ensure that your app will not hang, but it should work for your needs (not being able to touch the file system)

Try with
fmemopen
From How to write to a memory buffer with a FILE*?
tbert's answer:
For anyone else who stumbles upon this thread looking for a correct
answer: yes, there is a standards-compliant way to use memory as a
FILE descriptor: fmemopen or open_memstream, depending on the
semantics you want.
http://pubs.opengroup.org/onlinepubs/9699919799/functions/fmemopen.html
http://pubs.opengroup.org/onlinepubs/9699919799/functions/open_memstream.html

Related

Why use MPI_File_open instead of fopen?

After reading the MPI documentation, it doesn't sound like this gives you any additional functionality at all. I had assumed that it coordinated network traffic such that all file operations work with the given file on the executing system (the one issuing an mpirun command), as opposed to using the local filesystem on each individual host. This would be useful. Instead, the "user" needs to ensure that they all end up at the same file. Clearly they're not communicating that much about this file... are they?
What does MPI_File_open actually do, and how is it beneficial? Why should I not just use fopen?
Sure, MPI_File_open allows you to seek and read/write at particular blocks, like you would with fopen, in which case each process has a private file pointer. Differences from fopen include the nonblocking IO methods would allow your program to continue execution without waiting for the operation to complete. MPI also supports shared file pointers (e.g. MPI_File_read_shared), although obviously use of shared pointers have a synchronization overhead.

Data written to disk callback

How can I get callbacks once data has been successfully written to disk in Linux?
I would like to have my programs db file mapped into memory for read/write operations and receive callbacks once a write has successfully hit disk. Kind of like what the old VMSs used to do..
You need to call fdatasync (or fsync if you really need the metadata to be synchronised as well) and wait for it to return.
You could do this from another thread, but if one thread writes to the file while another thread is doing a fdatasync(), it's not going to be clear which of the writes are guaranteed to be persistent or not.
Databases which want to store transaction logs in a guaranteed-durable way, need to call fdatasync.
Databases (such as innodb) typically use direct IO (as well as their own data-caching, rather than rely on the OS) on their main data files, so that they know that it will be written in a predictable manner.
As far as I know you cannot get any notification when the actual synchronization between a file (or a mmaped region) happen, not even the timestamps of the file are going to change. You can, however, force the synchronization of the file (or region) by using fsync.
It is also hard to see a reason for why you would want that. File IO is supposed to be opaque.

Reading a file in a process while a process is writing in c++

I have a file increasing by time, and need to read the file without any race condition or something in another process in C++ on Windows.
Writing a file is given, and there is no room I can play with it. Only thing I can do is reading it gracefully.
Do you have any idea to handle this case well?
TIA
In Win32 you would have to make sure that every writer opens the file with at least read share access, and every reader opens the file with at least write share access. Further sharing would be required if you have >1 reader or >1 writer.
See here for CreateFile docs, dwShareMode parameter.
You'll almost certainly need to use CreateFile (In both processes) to allow sharing the file at all. If the writing application opens the file in exclusive sharing mode and keeps it open, the reading application won't be able to open the file at all.
From there, preventing race conditions is fairly straightforward: each process will typically use LockFile or LockFileEx to lock a section of the file for exclusive access while it uses data in that section of the file. In general, you want to keep that period of time as short as possible, so you'll lock the section, read/write, and unlock, all about as quickly as possible (i.e., without doing anything else, if you can avoid it).

How to treat RAM data as if it was a real file?

So I have some temp data in my program (in RAM). I want to somehow make it seem as it is a file (for example for sending it into another program which takes a file link as argument)?
Is it possible?
How to do such thing?
Why not simply write the file to disk? If writing to disk is too slow, you can pass the FILE_ATTRIBUTE_TEMPORARY flag to CreateFile to keep the data in cache (and avoid writing it to the physical device).
Sometimes the obvious solutions are the best...
If supported by your operating system (Unixoid systems and Windows do), you could try to use memory-mapped files.
You can do it in C using the popen() function:
FILE *f = popen("program args", "w");
// write your output to f here using stdio
pclose(f);
This is possible if your external program reads its input from stdin.
You could use pipe()
The pipe() function shall create a pipe and place two file descriptors,
one each into the arguments fildes[0] and fildes[1], that refer to the
open file descriptions for the read and write ends of the pipe. Their
integer values shall be the two lowest available at the time of the
pipe() call. The O_NONBLOCK and FD_CLOEXEC flags shall be clear on both
file descriptors. (The fcntl() function can be used to set both these
flags.)
Yes, it is possible. You can transfer your data to your other application via an interprocess communication mechanism:
Depending on your OS, you have different options here. You could create a pipe, as other posters have mentioned here, as many OSes have pipes.
You could also use shared memory.
You could simply write it out to a file, and then open up that file in your other application.
Many OSes have other techniques you can use.
EDIT: MSDN lists all the IPC mechanisms available for Windows here.

How do I find the file handles that my process has opened in Linux?

When we perform a fork in Unix, open file handles are inherited, and if we don't need to use them we should close them. However, when we use libraries, file handles may be opened for which we do not have access to the handle. How do we check for these open file handles?
In Linux you can check /proc/<pid>/fd directory - for every open fd there will be a file, named as handle. I'm almost sure this way is non-portable.
Alternatively you can use lsof - available for Linux, AIX, FreeBSD and NetBSD, according to man lsof.
You can do from a shell:
lsof -P -n -p _PID_
Where PID is your process pid.
If the libraries are opening files you don't know about, how do you know they don't need them after a fork? Unexported handles are an internal library detail, if the library wants them closed it will register an atfork() handler to close them. Walking around behind some piece of code closing its file handles behind its back will lead to subtle hard to debug problems since the library will error unexpectedly when it attempts to work with a handle it knows it opened correctly, but did not close.
As mentioned on #Louis Gerbarg's answer, the libraries are probably expecting the file handles to be kept open on fork() (which is supposed to be, after all, an almost identical copy of the parent process).
The problem most people have is on the exec() which often follows the fork(). Here, the correct solution is for the library which created the handles to mark them as close-on-exec (FD_CLOEXEC).
On libraries used by multithread programs, there is a race condition between a library creating a file handle and setting FD_CLOEXEC on it (another thread can fork() between both operations). To fix that problem, O_CLOEXEC was introduced in the Linux kernel.
To start with, you don't really need to care a whole lot about the open file descriptors you don't know about. If you know you're not going to write to them again, closing them is a good idea and doesn't hurt - you just did a fork() after all, the fds are open twice. But likewise, if you leave them open , they won't bother you either - after all, you don't know about them, you presumably won't be randomly writing to them.
As for what your third-party libraries will do, it's a bit of a toss-up either way. Some probably don't expect to run into a situation with a fork(), and might end up accidentally writing to the same fd from two processes without any synchronization. Others probably don't expect to have you closing their fds on them. You'll have to check. This is why it's a bad idea to randomly open a file descriptor in a library and not give it to the caller to manage.
All that said, in the spirit of answering the original question, there isn't a particularly good way. You can call dup() or dup2() on a file descriptor; if it's closed, the call will fail with EBADF. So you can say:
int newfd = dup(oldfd);
if (newfd > 0)
{
close(newfd);
close(oldfd);
}
but at that point you're just as well off saying close(oldfd) in the first place and ignoring any EBADFs.
Assuming you still want to take the nuclear option of closing everything, you then need to find the maximum number of open file descriptors possible. Assuming 1 to 65,535 is not a good idea. First of all, fds start at 0, of course, but also there's no particular upper limit defined. To be portable, POSIX's sysconf(_SC_OPEN_MAX) should tell you, on any sane POSIX system, though strictly speaking it's optional. If you're feeling paranoid, check the return value for -1, though at that point you mostly have to fall back on a hardcoded value anyway (1024 should be fine unless you're doing something extremely weird). Or if you're fine with being Linux-specific, you can dig around in /proc.
Don't forget to not close fds 0, 1, and 2 - that can really confuse things.
I agree with what other people have said about closing random files being dangerous. You might end up filing some pretty interesting bug reports for all of your third-party tools.
That said, if you know you won't need those files to be open, you can always walk through all of the valid file descriptors (1 to 65535, IIRC) and close everything you don't recognize.
Reasonable libraries will always have functions which free whatever resources (eg. file handles) they have allocated.
Just a link, but it seems helpful: How many open files? at netadmintools.com. It seems to use /proc investigations to learn about a process' open files, not sure if that is the only way or if there is an API. Parsing files for this type of information can be a bit ... messy. Also, /proc might be deprecated too, something to check for.
Isn't this a design issue ? Is it possible for your process to fork before initializing the libs that open those files ?