Implementing a File Object (C++) - c++

I've been looking over the Doom 3 SDK code, specifically their File System implementation.
The system works (the code I have access to at least) by passing around an 'idFile' object and I've noticed that this class provides read and write methods as well as maintaining a FILE* member.
This suggests to me that either the FILE* is 'opened' with read and write access or the file is closed and reopened (with the appropriate access) between calls to Read() and Write().
Does this sound correct or am I over simplifying it?
If this isn't the case (which part of me suspects it isn't - due to speed etc.) does anyone have any suggestions as to how they would achieve this elegant interface?
Please bare in mind that I am fairly new to both C++ and stdio (which I'm pretty sure iD favours).

You can open a FILE* in read-write mode.
If you do that, you should flush and seek to a known location when changing between reading and writing, but you don't have to reopen the file.

Without ever having looked at the Doom code (I'm guessing you can specify a mode when you create the object), you can use freopen() to re-open a file (in a different mode, if you want) without closing it first.

Related

How to capture and modify read and writes on file using Dlang or cpp

Let's say that another program is to write or read a specific file. When it does that, I need to be able to take that read or write and handle the way I like (for example, program X wants to read a file located at /path/file.txt, but program Y (my program) takes that read "request" and instead gives program X the encryption first 2KiB of another file located at /path/file2). Essentially, any time a specified file is being read or written to, my program will be called and it will handle the read or write request in Dlang or C++. I cannot create a new file system for this :( and it has to at least work with Linux (so anything specific to Linux works). Also, it is crucial that I RESPOND to the read or write and not preprocess the result, sorry this was not clear in the example.
What you need is a seek-able FIFO (named pipe). It has been proposed but as far as I know it has not been implemented yet (check Linux kernel changelogs, maybe it has but I do not know about it).
As suggested, a new, tiny filesystem is your best option. Luckily it is pretty simple to write one with the FUSE project.

multiple access to file in r/w

I'm planning to write a programm which has to access to a certain file many times in r/w.
So I decided to use fstream, since I can use this class for both reading and writing purpose.
My idea is to open the file at the startup of the application and then close it as the application is closed too.
Since the file can be arbitrarily big, I was planning to use a "paging" structure, in which:
1) preallocate a fixed amount of memory for each page and a fixed number of page
2) load part of the file in to the first free page
3) if there is no free page, I select one non empty with a certain criterion, I commit all edit in it (if there are any) and then load the part of file in the page.
That's not so hard to code. But I was wondering If I'm going to reinvent the wheel... maybe the fstream itself is written in a smart way so that it also implements a similar paging mechanism. In that case, I would not take care about, just write and read at any time.
Some suggestion?
Don't do this by yourself. Unless you are using very exotic implementation, the fstream class already implement such a mechanism efficiently.
Checkout http://www.cplusplus.com/doc/tutorial/files/ "Buffers and Synchronization"
There are possible issues if you are seek-ing into file larger than 2GB with a old kernel or implementation of the standard library. Check this
Large file support in C++
or use Boost.Filesystem
Internal working of the standard C++ library vary by implementation. Hence a test would be needed to get some real data on your preferred platform. Generally memory mapped files are considered to be the fastest way to access data stored in a file (as Uflex has mentioned in his comment, but it has some drawbacks as well (see the linked wiki page). You can either use the standard (POSIX) C functions mmap() and munmap(), or the Boost C++ libraries which also have a portable C++ interface for memory mapped files.

CreateFile vs fopen vs ofsteam advantage & disadvantage?

CreateFile vs fopen vs ofsteam - advantage & disadvantage?
I heard that CreateFile powerful but only for windows.
Can you tell what should I use ( on windows) and why?
It depends on what you're doing. For sequentially reading and writing text files, iostream is definitely the way to go. For anything involving transactional security or non-standard devices, you'll have to access the system directly (CreateFile or open). Even then, for sequential reading and writing of text, the best solution is to define your own streambuf, and use that with iostream.
I can't think of any context where fopen would be preferable.
Unless you need the features provided by the Windows file functions (e.g. overlapped I/O) then my suggestion is going with either iostreams in C++ or FILE (fopen and friends) in C.
Besides being more portable, you can also use formated input/output for text files, and for C++ it's easy to overload the output/input operators for your classes.
If you want to use Windows file memory mapping you should use CreateFile (e.g. the HANDLE passed to CreateFileMapping API is the return value of CreateFile). Moreover, CreateFile offers higher customization options than C and C++ standard file API.
However, if you want to write portable code, or if you don't need Windows-specific features, C and C++ standard file APIs are just fine.
In some tests, when processing large data, I noted some performance overhead of C++ I/O streams vs. raw C file API; if you happen to be in cases like this, you could simply wrap raw C file API in some C++ RAII class, and still use it in C++ code.
Unless you absoulutely need the extra functionality provided by OS API functions (like CreateFile) I'd reccommend using the standard library functions (like fopen or ofstream). This way your program will be more portable.
The only real advantage of using CreateFile that I can think of is overlapped I/O and maybe finer grained access rights.
In most cases you will be better of using fopen in C or ofstream in C++. CreateFile gives some extra control over sharing and caching but does not provide formatting functionality.
I copied my answer from
fopen or CreateFile in Windows
which was closed for some reason which escapes me...
There is no defined way for fopen() to return the system error code. There might be an undefined way to access errno, but this might or might not be identical with the system error code.
Also, I don't think that there is an defined way to access the real system handle (of type HANDLE) which in turn you might want to use to pass on to one of the many win64 system calls which expect such a system handle (e.g. memory mapped IO)
Using open() an integer represents the file handle, which is not the system handle (on windows).
fopen() does not throw an exception in case of error. In order to have some RAII you would need to wrap it into a class.
Wrapping CreateFile() into a class, is not more expensive than wrapping fopen() or open() into a class.
Using the C++ feature (std::ofstream, std::ifstream) to write/read to/from files suffers from the same problem as fopen():
It does not throw by default on error. In order to enable this feature you need to call some method instead of being able to use some constructor argument -- means for RAII you would need to derive this class (in order to use it as a member/base class which throws on error).
It is undefined if one is able to retrieve the system error code from the exception thrown or if the message returned from what() tells you anything about the system error.
Using this stream interface there is no real pluggable interface to define the source or destination of reading from or writing to. Overloading the stream interface is quite cumbersome and error prone.
Using C like programming (paying attention to or ignoring return codes and manually writing cleanup code) is the source of much evil (remember heart-bleed?)...
Conclusions:
write a resource wrapper for CreateFile()/CloseHandle(). Resource wrapper is a class, which performs the do-action in the constructor and the undo-action in the destructor and throws an exception in case of error. There are many such pairs of system calls in every OS but especially in Win64.
Write a system error exception class (to be used for the above class in case of CreateFile() fails and for all the other system errors) or investigate, what the new system_exception class (in C++0x) is actually doing and if it is sufficient.
write a functional wrapper for ReadFile() and WriteFile() which converts a system error into a system exception object thrown...
Potentially define your own interface to write to somewhere and read from somewhere so that you can implement other things independent from the type of source/destination to read from/write to.
Writing a cache class which allows you to cache reading from somewhere or writing to somewhere is also child play. Of course the cache class should not know nor care about the source/destination you're writing to/reading from.
Don't be scared about these small tasks. You will actually know, what is happening in your code and these little pieces of code should be negligible (in amount of lines of code) compared to the code calling it. Also if you're using RAII for everything, the code calling into these utility classes, will be considerable less compared to when not using RAII and having to use two- or even more-step initialization and considerable less error prone. Replacing these utility classes with equivalent utility classes for other OS is also child play (using open()/close()/read()/write() on UNIXs).
And for the sake of the previous millennia don't read the google programming guidelines!

how to make sure that a file will be closed at the end of the run

Suppose someone wrote a method that opens a certain file and forgets to close it in some cases. Given this method, can I make sure that the file is closed without changing the code of the original method?
The only option I see is to write a method that wraps the original method, but this is only possible if the file is defined outside the original method, right? Otherwise it's lost forever...
Since this is C++, I would expect that the I/O streams library (std::ifstream and friends) would be used, not the legacy C I/O library. In that case, yes, the file will be closed because the stream is closed by the stream object's destructor.
If you are using the legacy C API, then no, you're out of luck.
In my opinion, the best answer to an interview question like this is to point out the real flaw in the code--managing resources manually--and to suggest the correct solution: use automatic resource management ("Resource Acquisition is Initialization" or "Scope-Bound Resource Management").
You are correct that if the wrapper doesn't somehow get a reference to the opened file, it may be difficult to close it. However, the operating system might provide a means to get a list of open files, and you could then find the one you need to close.
However, note that most (practically all) operating systems take care of closing files when the application exits, so you don't need to worry about a file being left open indefinitely after the program stops. (This may or may not be a reasonable answer to the question you were given, which seems incredibly vague and ambiguous.)
If you are using C function for file open, you can use _fcloseall function for closing all the opened files.
If you are using C++, Like James suggested, stream destructor should take care of it.
Which environment are you in? You can always check the file descriptors opened by the process and close them forcefully.
Under linux you can use the lsof command to list open files for a process. Do it once before the method and once after the method to detect newly opened files. Hopefully you aren't fighting some multithreaded legacy beast.

How to create a virtual file?

I'd like to simulate a file without writing it on disk. I have a file at the end of my executable and I would like to give its path to a dll. Of course since it doesn't have a real path, I have to fake it.
I first tried using named pipes under Windows to do it. That would allow for a path like \\.\pipe\mymemoryfile but I can't make it works, and I'm not sure the dll would support a path like this.
Second, I found CreateFileMapping and GetMappedFileName. Can they be used to simulate a file in a fragment of another ? I'm not sure this is what this API does.
What I'm trying to do seems similar to boxedapp. Any ideas about how they do it ? I suppose it's something like API interception (Like Detour ), but that would be a lot of work. Is there another way to do it ?
Why ? I'm interested in this specific solution because I'd like to hide the data and for the benefit of distributing only one file but also for geeky reasons of making it works that way ;)
I agree that copying data to a temporary file would work and be a much easier solution.
Use BoxedApp and do not worry.
You can store the data in an NTFS stream. That way you can get a real path pointing to your data that you can give to your dll in the form of
x:\myfile.exe:mystreamname
This works precisely like a normal file, however it only works if the file system used is NTFS. This is standard under Windows nowadays, but is of course not an option if you want to support older systems or would like to be able to run this from a usb-stick or similar. Note that any streams present in a file will be lost if the file is sent as an attachment in mail or simply copied from a NTFS partition to a FAT32 partition.
I'd say that the most compatible way would be to write your data to an actual file, but you can of course do it one way on NTFS systems and another on FAT systems. I do recommend against it because of the added complexity. The appropriate way would be to distribute your files separately of course, but since you've indicated that you don't want this, you should in that case write it to a temporary file and give the dll the path to that file. Make sure you write the temporary file to the users' temp directory (you can find the path using GetTempPath in C/C++).
Your other option would be to write a filesystem filter driver, but that is a road that I strongly advise against. That sort of defeats the purpose of using a single file as well...
Also, in case you want only a single file for distribution, how about using a zip file or an installer?
Pipes are for communication between processes running concurrently. They don't store data for later access, and they don't have the same semantics as files (you can't seek or rewind a pipe, for instance).
If you're after file-like behaviour, your best bet will always be to use a file. Under Windows, you can pass FILE_ATTRIBUTE_TEMPORARY to CreateFile as a hint to the system to avoid flushing data to disk if there's sufficient memory.
If you're worried about the performance hit of writing to disk, the above should be sufficient to avoid the performance impact in most cases. (If the system is low enough on memory to force the file data out to disk, it's probably also swapping heavily anyway -- you've already got a performance problem.)
If you're trying to avoid writing to disk for some other reason, can you explain why? In general, it's quite hard to stop data from ever hitting the disk -- the user can always hibernate the machine, for instance.
Since you don't have control over the DLL you have to assume that the DLL expects an actual file. It probably at some point makes that assumption which is why named pipes are failing on you.
The simplest solution is to create a temporary file in the temp directory, write the data from your EXE to the temp file and then delete the temporary file.
Is there a reason you are embedding this "pseudo-file" at the end of your EXE instead of just distributing it with our application? You are obviously already distributing this third party DLL with your application so one more file doesn't seem like it is going to hurt you?
Another question, will this data be changing? That is are you expecting to write back data this "pseudo-file" in your EXE? I don't think that will work well. Standard users may not have write access to the EXE and that would probably drive anti-virus nuts.
And no CreateFileMapping and GetMappedFileName definitely won't work since they don't give you a file name that can be passed to CreateFile. If you could somehow get this DLL to accept a HANDLE then that would work.
And I wouldn't even bother with API interception. Just hand the DLL a path to an acutal file.
Reading your question made me think: if you can pretend an area of memory is a file and have kind of "virtual path" to it, then this would allow loading a DLL directly from memory which is what LoadLibrary forbids by design by asking for a path name. And this is why people write their own PE loader when they want to achieve that.
I would say you can't achieve what you want with file mapping: the purpose of file mapping is to treat a portion of a file as if it was physical memory, and you're wanting the reciprocal.
Using Detours implies that you would have to replicate everything the intercepted DLL function does except from obtaining data from a real file; hence it's not generic. Or, even more intricate, let's pretend the DLL uses fopen; then you provide your own fopen that detects a special pattern in the path and you mimmic the C runtime internals... Hmm is it really worth all the pain? :D
Please explain why you can't extract the data from your EXE and write it to a temporary file. Many applications do this -- it's the classic solution to this problem.
If you really must provide a "virtual file", the cleanest solution is probably a filesystem filter driver. "clean" doesn't mean "good" -- a filter is a fully documented and supported solution, so it's cleaner than API hooking, injection, etc. However, filesystem filters are not easy.
OSR Online is the best place to find Windows filesystem information. The NTFSD mailing list is where filesystem developers hang out.
How about using a some sort of RamDisk and writing the file to this disk? I have tried some ramdisks myself, though never found a good one, tell me if you are successful.
Well, if you need to have the virtual file allocated in your exe, you will need to create a vector, stream or char array big enough to hold all of the virtual data you want to write.
that is the only solution I can think of without doing any I/O to disk (even if you don't write to file).
If you need to keep a file like path syntax, just write a class that mimics that behaviour and instead of writing to a file write to your memory buffer. It's as simple as it gets. Remember KISS.
Cheers
Open the file called "NUL:" for writing. It's writable, but the data are silently discarded. Kinda like /dev/null of *nix fame.
You cannot memory-map it though. Memory-mapping implies read/write access, and NUL is write-only.
I'm guessing that this dll cant take a stream? Its almost to simple to ask BUT if it can you could just use that.
Have you tried using the \?\ prefix when using named pipes? Many APIs support using \?\ to pass the remainder of the path directly through without any parsing/modification.
http://msdn.microsoft.com/en-us/library/aa365247(VS.85,lightweight).aspx
Why not just add it as a resource - http://msdn.microsoft.com/en-us/library/7k989cfy(VS.80).aspx - the same way you would add an icon.