How to force file flushing - c++

Suppose that I have the following code:
#include <chrono>
#include <fstream>
#include <thread>
int main()
{
std::ofstream f("test.log");
int i = 0;
while (true)
{
f << i++;
f.flush();
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
}
(note that I have a flush call after each write operation)
I noticed that this application doesn't update "last modified time" and "size" attributes of the "test.log" file unless I do a right-click on this file or open it.
I guess that this is due to an internal bufferization (system doesn't want to make such time-consuming operations as an actual I/O to disk unless forced to do so). Am I right?
I need to write an application that should watch for changes in log files created by other applications (I can't change them). At first, I thought about FileSystemWatcher class in C# but I noticed that it has the same behavior (it doesn't fire a corresponding event unless file was closed in a source application or was forced to update by right-clicking that file in Windows Explorer). What can I do then? Call WinAPI functions like GetFileAttributes for every file that I want to look for as often as I can?

There are two separate things here. First, the last modified time on the file MFT record (inode equivalent) is updated every time you write to it.
However the information returned by FindFirstFile and friends is not from the file, it is from information cached in the directory entry. This cache is updated whenever a file is closed which was opened through that directory entry. This is the information displayed by most applications, such as Windows Explorer, and the command prompt DIR command.
If you want to know when a file was updated you need to do the equivalent of a Unix stat operation which reads the MFT record (inode). This requires opening a handle to the file, calling GetFileInformationByHandle and closing the handle again.
The second thing is that there is a good reason not to do this. If a program is writing to a file, it may be partway through the writing process. Therefore the file may be in an invalid (corrupt) state. To ensure that the file is in a valid state you should wait until the file has been closed. This is how you know that the file is now ready to look at.
Once the writing program has finished writing to the file, the directory entry will be updated and FileSystemWatcher will show the file.
If you are absolutely sure you want to see notifications of files which are still in the process of being written, then you can look into the USN change journal as an option. I don't know if this is kept more up to date than the directory entries, you will have to investigate that.

Related

Windows Inject a standard C FILE Structure into A Running Process

I had sort of an odd idea and was wondering whether it would be possible. Here's a rough outline of my plan.
Scenario: An application loads and interprets values from a config file at startup. I want to fuzz the application via the config file, without rewriting the config file.
Note: The config file is closed later on in the program, and the function that opens the config file is used to open various other files, so I do not want to hook this function. While SetKMode() and SetProcPermissions() are used here, answers that apply to Windows in general are just as helpful as Windows CE answers.
Plan:
Attain debug privileges over this process via SetKMode() and SetProcPermissions and attach a debugger via DebugActiveProcess()
Break after the function that loads the file returns
Create a temporary modified version of the file and open it in the parent process
Use VirtualAlloc() to allocate space for the FILE structure in the debugee
Transfer the entire FILE structure for the temporary file to the debugee using WriteProcessMemory()
Swap the pointer for the config file loaded by the debuggee to the pointer for the temporary file
Allow the debugee to run the file
Before the debugee closes the file, copy the old pointer for the original config file back to the new pointer so that it closes the correct file
Would the debugger be able to read the file? Would the parent be able to close the file after it's finished?
Edit:
Transferring the old pointer back to the debugee every time it tries to close the file no longer seems like a good solution after some RE, so on top of my current question I have an additional question: Would the debugee be able to close the file the debugger opened? Would that be a problem? And would the fact that the original file isn't closed properly be a problem?
Edit:
Sorry I'm a dummy who forgot that if I'm going through the trouble of injecting all this I can just inject a new filename and swap the pointer long before the call to fopen.
Assuming the entire file is loaded into memory and then parsed, I would hook whatever function loads the file data into memory, use a conditional to check the filename so you're only executing your code after the correct file is loaded into memory by checking the filename. Then I would perform my fuzzing by modifying the file data in memory and then return execution to the target process before the file dats is then parsed. In this manner you aren't touching any file permissions, only memory.
To automate it create a "loader' which executes the target process, injects, executes your hook and then checks for crash or other unwanted behavior.

Robust way to detect if file has changed

I think this question hasn't been answered for my use-case.
We wish to detect if the user has changed a file without re-reading its contents for the purposes of caching a computation result based on the file contents. Our program is a long-running one that lets the user click a button to perform a computation based on data entered in the program and data stored in external files (sorry, I can't be more specific than that). The external data needs to be read, processed and various data structures need to be built based on it, so we try to cache those between computations to speed up re-computes when the user changes the data in the program itself, but not the data in the external files. However, if the external file has changed, we have to re-read that.
For each external resource we're checking if the modification time and file size have changed, but that's not really all that robust and can lead to user frustration if they have e.g. fileA and fileB with the same size and timestamp and copy or fileA to fileC, use fileC as an external resource, and then copy fileB to fileC. The system preserves the modification time of the original file and the sizes are the same, so we don't re-read the external resource.
Our program runs on Windows, macOS and Linux, is written in C++ and we're perfectly OK with using platform-specific code to detect file changes. We're interested in the most robust way to detect if the contents of a file identified by a file path have changed without actually reading the file itself.
I've made this answer a community wiki so others can add their ideas for the various platforms listed in the question.
Linux
MacOS
Windows
Option 1
Set up a thread that watches the directory containing the file. When the directory changes, you'll have to check if the file you care about has actually changed. That may mean opening and re-reading the file, (e.g., to compute the current checksum). But since you have to do this only after a change notification, this overhead may be acceptable.
I believe (but have not verified) that if someone copies a same-size, same-timestamp file over an existing file, you'll get a directory change notification.
Option 2
Hold the file open with an opportunistic lock. This involves creating the lock with a call to DeviceIoControl and then issuing a blocking call to GetOverlappedResult, which will unblock when another process attempts to change the file. Your program can the release the lock, allowing the other process to update the file, and know that the file is being changed.

How to know when writing a file has been finished?

I'm writing a program which operates on a file (only reads the file), while another program is writing that file (I've no control over it to use events and I don't know the content of the file). I want a way to know when that program finished writing, to stop my program operating on the file. I used these two method but I don't know which one is reliable and more performance:
1- renaming file to another name, if success, rename it to original name.
2-flush file , if file size has not been changed for a while (e.g 5 sec) then stop operation.
which one is better? is there any better way (more reliable and more performance)?
I'm using windows 7 and qt5.2(or visual studio) for c++.
Qt provides a class called QFileSystemWatcher which allows you to monitor files and directories.

Autosaving files with multiple instances

I'm writing a Qt/C++ program that does long-running simulations, and to guard against data loss, I wrote some simple autosave behaviour. The program periodically saves to the user's temp directory (using QDir::temp()), and if the program closes gracefully, this file is deleted. If the program starts up and sees the file in that directory, it assumes a previous instance crashed or was forcibly ended, and it prompts the user about loading it.
Now here is the complication - I'd like this functionality to work properly even if multiple instances of the program are used at once. So when the program loads, it can't just look for the presence of an autosave file. If it finds one, it needs to determine if that file was created by a running instance (in which case, there's nothing wrong and nothing to be done) or if it has been left over by a instance that crashed or was forcibly ended (in which case it should prompt the user about loading it).
My program is for Windows/Mac/Linux, so what would be the best way to implement this using Qt or otherwise in a cross-platform fashion?
Edit:
The comments suggested the use of the process identifier, which I can get using QCoreApplication::applicationPid(). I like this idea, but when the program loads and sees a file with a certain PID in the name, how can it look at the other running instances (if any) to see if there is a match?
You can simply use QSaveFile which, as the documentation states:-
The QSaveFile class provides an interface for safely writing to files.
QSaveFile is an I/O device for writing text and binary files, without losing existing data if the writing operation fails.
While writing, the contents will be written to a temporary file, and if no error happened, commit() will move it to the final file. This ensures that no data at the final file is lost in case an error happens while writing, and no partially-written file is ever present at the final location. Always use QSaveFile when saving entire documents to disk.
As for multiple instances, you just need to reflect that in the filename.

How to check if a file is still being written?

How can I check if a file is still being written? I need to wait for a file to be created, written and closed again by another process, so I can go on and open it again in my process.
In general, this is a difficult problem to solve. You can ask whether a file is open, under certain circumstances; however, if the other process is a script, it might well open and close the file multiple times. I would strongly recommend you use an advisory lock, or some other explicit method for the other process to communicate when it's done with the file.
That said, if that's not an option, there is another way. If you look in the /proc/<pid>/fd directories, where <pid> is the numeric process ID of some running process, you'll see a bunch of symlinks to the files that process has open. The permissions on the symlink reflect the mode the file was opened for - write permission means it was opened for write mode.
So, if you want to know if a file is open, just scan over every process's /proc entry, and every file descriptor in it, looking for a writable symlink to your file. If you know the PID of the other process, you can directly look at its proc entry, as well.
This has some major downsides, of course. First, you can only see open files for your own processes, unless you're root. It's also relatively slow, and only works on Linux. And again, if the other process opens and closes the file several times, you're stuck - you might end up seeing it during the closed period, and there's no easy way of knowing if it'll open it again.
You could let the writing process write a sentinel file (say "sentinel.ok") after it is finished writing the data file your reading process is interested in. In the reading process you can check for the existence of the sentinel before reading the data file, to ensure that the data file is completely written.
#blu3bird's idea of using a sentinel file isn't bad, but it requires modifying the program that's writing the file.
Here's another possibility that also requires modifying the writer, but it may be more robust:
Write to a temporary file, say "foo.dat.part". When writing is complete, rename "foo.dat.part" to "foo.dat". That way a reader either won't see "foo.dat" at all, or will see a complete version of it.
You can try using inotify
http://en.wikipedia.org/wiki/Inotify
If you know that the file will be opened once, written and then closed, it would be possible for your app to wait for the IN_CLOSE_WRITE event.
However if the behaviour of the other application doing the writing of the file is more like open,write,close,open,write,close....then you'll need some other mechanism of determining when the other app has truly finished with the file.