Explicitly saving the file using fstream without closing the file in C++ - c++

WRT code, i want to explicitly "save" the file without calling close(). i know there is no need to call close() as fsteam will call the destructor and save the file when fstream object goes out of scope.
But i want to "explictly" save the file without waiting for fstream
object to go out of scope. Is there a way to do this in C++ ?
Is there anything like
flHtmlFile.save()
The only 1 option i know is to close it & open again ?
#include <fstream>
int main()
{
std::ofstream flHtmlFile;
std::string fname = "some.txt";
flHtmlFile.open(fname);
flHtmlFile << "text1"; // gets written
flHtmlFile.close(); // My purpose is to EXPLICITLY SAVE THE FILE. Is there anything like flHtmlFile.save()
flHtmlFile << "text2"; // doesn't get written 'coz i called close()
return 1;
}

Files are often some stream of bytes, and could be much bigger than your virtual address space (e.g. you can have a terabyte sized file on a machine with only a few gigabytes of RAM).
In general a program won't keep all the content of a file in memory.
Some libraries enable you to read or write all the content at once in memory (if it fits there!). E.g. Qt has a QFile class with an inherited readAll member function.
However, file streams (either FILE from C standard library, or std::ostream from C++ standard library) are buffered. You may want to flush the buffer. Use std::flush (in C++) or fflush (in C); they practically often issue some system calls (probably write(2) on Linux) to ask the operating system to write some data in some file (but they probably don't guarantee that the data has reached the disk).
What exactly happens is file system-, operating system-, and hardware- specific. On Linux, the page cache may keep the data before it is written to disk (so if the computer loses power, data might be lost). And disk controller hardware also have RAM and are somehow buffering. See also sync(2) and fsync(2) (and even posix_fadvise(2)...). So even if you flush some stream, you are not sure that the bytes are permanently written on the disk (and you usually don't care).
(there are many layers and lots of buffering between your C++ code and the real hardware)
BTW you might write into memory thru std::ostringstream in C++ (or open_memstream in C on POSIX), flush that stream, then do something with its memory data (e.g. write(2) it to disk).

If all you want is for the content you wrote to reach the file system as soon as possible, then call flush on the file:
flHtmlFile.flush();
No closing or re-opening required.

Related

When the file is acctually written to the disk? Does the file is guaranteed to be written to the disk just after\when the std::fstream is destroyed?

For the code snippet below, does the file is guaranteed to be written to the disk just after\when the std::fstream is destroyed?
#include<iostream>
#include<fstream>
using namespace std;
int main(int argc,char* argv[]){
{
std::fstream file{"test.txt"};
file << "when the file is actually written out?" << std::endl;
}
return 0;
}
Rephrasing my comment as an actual answer:
No. That code snippet does not guarantee the data to be on "persistent storage". You tell the os to take that file and store it. But the OS takes that data and gives it to the IO-scheduler. What the scheduler does or does not do is (paratially) out of your control, as that schedules the tasks as it sees fit.
You can however force the scheduler to write all pending metadata and data to "persistent storage" (read: the actual filesystem) by calling sync()
https://man7.org/linux/man-pages/man2/sync.2.html
Warning, assumption: What may be a problem with that is that you are ruling over the scheduler. So if for example another program on that machine has a huge file to write and the schedular wants to wait with that file write because of generally high io right now. If you call sync() you force the scheduler to write all (meta)data, so it will also write that file. Sync will not return untill all data is written. It does not matter where your write is in the queue, so it actually may take several seconds (minutes even on slow io and hughe files) for that call to return.
Idk how one would do that on windows or how windows actually handles disk writes at all

Can you have multiple "cursors" for the same ifstream ? Would that be thread-safe?

I have multiple threads, and I want each of them to process a part of my file. Can I have a single ifstream object for that and make them read concurrently read different parts ? The parts are non overlapping, so the same line will not be processed by two threads. If yes, how to get multiple cursors ?
A single std::ifstream is associated with exactly one cursor (there's a seekg and tellg method associated with the std::ifstream directly).
If you want the same std::ifstream object to be shared accross multiple threads, you'll have to have some sort of synchronization mechanism between the threads, which might defeat the purpose (in each thread, you'll have to lock, seek, read and unlock each time).
To solve your problem, you can open one std::ifstream to the same file per thread. In each thread, you'd seek to whatever position you want to start reading from. This would only require you to be able to "easily" compute the seek position for each thread though (Note: this is a pretty strong requirement).
C++ file streams are not guaranteed to be thread safe (see e.g. this answer).
The typical solution is anyway to open separate streams on the same file, each instance comes with their own "cursor". However, you need to ensure shared access, and concurrency becomes platform specific.
For ifstream (i.e. only reading from the file), the concurrency issues are usually tame. Even if someone else modifies the file, both streams might see different content, but you do have some kind of eventual consistency.
Reads and writes are usually not atomic, i.e. you might read only part of a write. Writes might not even execute in the order they are issued (see write combining).
Looking at FILE struct it seems like there is a pointer inside FILE, char* curp, pointing to the current active pointer, which may mean that for each FILE object, you'd have one particular part of the file.
This being in C, I don't know how ifstream works and if it uses FILE object/it is built like a FILE object. Might not help you at all, but I thought it would be interesting to share this little information, and that it could may be help someone.

Is a file guaranteed to be openable for reading immidiately after ofstream::close() has returned?

I need my code (C++, on linux) to call a second executable, having previously written an output file which is read by the second program. Does the naïve approach,
std::ofstream out("myfile.txt");
// write output here
out.close();
system("secondprogram myfile.txt");
suffer from a potential race condition, where even though out.close() has executed, the file cannot immediately be read by secondprogram? If so, what is the best practice for resolving this?
Three notes:
If this is file-system-dependent, I'm interested in the behaviour on ext3 and tmpfs.
Clearly there are other reasons (file permissions etc.) why the second program might fail to open the file; I'm just interested in the potential for a race condition.
The hardcoded filename in the example above is for simplicity; in reality I use mkstemp.
Once the file has been closed, all the written data is guaranteed to be flushed from the buffers of the ofstream object (because at that point you can destroy it without any risk of losing whatsoever data, and actually closing the file is internally done by the destructor if needed). This does not mean that the data will at this point be physically on the disk (it will probably not, because of caching behavior of the OS disk drivers), but any program running in the same OS will be able to read the file consistently (as the OS will then perform the reading from the cached data). If you need to flush the OS buffers to the disk (which is not needed for your secondprogram to correctly read the input file), then you might want to look at the sync() function in <unistd.h>.
There is a potential failure mode that I missed earlier: You don't seem to have a way of recovering when the file cannot be opened by secondprogram. The problem is not that the file might be locked/inconsistent after close() returns, but that another program, completely unrelated to yours, might open the file between close() and system() (say, an AV scanner, someone greping through the directory containing the file, a backup process). If that happens, secondprogram will fail even though your program behaves correctly.
TL/DR: Even though everything works as expected, you have to account for the case that secondprogram may not be able to open the file!
According to cplusplus.com the function will return, when all data has been written to disk. So there should be no race-condition.

How do I ensure data is written to disk before closing fstream?

The following looks sensible, but I've heard that the data could theoretically still be in a buffer rather than on the disk, even after the close() call.
#include <fstream>
int main()
{
ofstream fsi("test.txt");
fsi << "Hello World";
fsi.flush();
fsi.close();
return 0;
}
You cannot to this with standard tools and have to rely on OS facilities.
For POSIX fsync should be what you need. As there is no way to a get C file descriptor from a standard stream you would have to resort to C streams in your whole application or just open the file for flushing do disk. Alternatively there is sync but this flushes all buffers, which your users and other applications are going to hate.
You could guarantee the data from the buffer is written to disk by flushing the stream. That could be done by calling its flush() member function, the flush manipulator, the endl manipulator.
However, there is no need to do so in your case since close guarantees that any pending output sequence is written to the physical file.
§ 27.9.1.4 / 6:
basic_filebuf< charT, traits >* close();
Effects: If is_open() == false, returns a null pointer. If a put area exists, calls overflow(traits::eof()) to flush characters. (...)
§ 27.9.1.4
basic_filebuf* close();
Effects: If is_open() == false, returns a null pointer. If a put area
exists, calls overflow(traits::eof()) to flush characters. If the last
virtual member function called on *this (between underflow, overflow,
seekoff, and seekpos) was overflow then calls a_codecvt.unshift
(possibly several times) to determine a termination sequence, inserts
those characters and calls overflow(traits::eof()) again. Finally,
regardless of whether any of the preceding calls fails or throws an
exception, the function closes the file (as if by calling
std::fclose(file)). If any of the calls made by the function,
including std::fclose, fails, close fails by returning a null pointer.
If one of these calls throws an exception, the exception is caught and
rethrown after closing the file.
It's guaranteed to flush the file. However, note that the OS might keep it cached, and the OS might not flush it immmediately.
Which operating system are you using?
You need to use Direct (non-buffered) I/O to guarantee the data is written directly to the physical device without hitting the filesystem write-cache. Be aware it still has to pass thru the disk cache before getting physically written.
On Windows, you can use the FILE_FLAG_WRITE_THROUGH flag when opening the file.
The close() member function closes the underlying OS file descriptor. At that point, the file should be on disk.
I'm pretty sure the whole point of calling close() is to flush the buffer. This site agrees. Although depending on your file system and mount settings, just because you've 'written to the disk' doesn't mean that your file system drivers and disk hardware have actually taken the data and made magnet-y bits on the physical piece of metal. It could probably be in a disk buffer still.
How abt flushing before closing?

"live C++ objects that live in memory mapped files"?

So I read this interview with John Carmack in Gamasutra, in which he talks about what he calls "live C++ objects that live in memory mapped files". Here are some quotes:
JC: Yeah. And I actually get multiple benefits out of it in that... The last iOS Rage project, we shipped with some new technology that's using some clever stuff to make live C++ objects that live in memory mapped files, backed by the flash file system on here, which is how I want to structure all our future work on PCs.
...
My marching orders to myself here are, I want game loads of two seconds on our PC platform, so we can iterate that much faster. And right now, even with solid state drives, you're dominated by all the things that you do at loading times, so it takes this different discipline to be able to say "Everything is going to be decimated and used in relative addresses," so you just say, "Map the file, all my resources are right there, and it's done in 15 milliseconds."
(Full interview can be found here)
Does anybody have any idea what Carmack is talking about and how you would set up something like this? I've searched the web for a bit but I can't seem to find anything on this.
The idea is that you have all or part of your program state serialized into a file at all times by accessing that file via memory mapping. This will require you not having usual pointers because pointers are only valid while your process lasts. Instead you have to store offsets from the mapping start so that when you restart the program and remap the file you can continue working with it. The advantage of this scheme is that you don't have separate serialization which means you don't have extra code for that and you don't need to save all the state at once - instead your (all or most of) program state is backed by the file at all times.
You'd use placement new, either directly or via custom allocators.
Look at EASTL for an implementation of (subset) STL that is specifically geared to working well with custom allocation schemes (such as required for games running on embedded systems or game consoles).
A free subset of EASTL is here:
http://gpl.ea.com/
a clone at https://github.com/paulhodge/EASTL
We use for years something we call "relative pointers" which is some kind of smart pointer. It is inherently nonstandard, but works nice on most platforms. It is structured like:
template<class T>
class rptr
{
size_t offset;
public:
T* operator->() { return reinterpret_cast<T*>(reinterpret_cast<char*>(this)+offset); }
};
This requires that all objects are stored into the same shared memory (which can be a filemap too). It also usually requires us to only store our own compatible types in there, as well as havnig to write own allocators to manage that memory.
To always have consistent data, we use snapshots via COW mmap tricks (which work in userspace on linux, no idea about other OSs).
With the big move to 64bit we also sometimes just use fixed mappings, as the relative pointers incur some runtime overhead. With usually 48bits of address space, we chose a reserved memry area for our applications that we always map such a file to.
This reminds me of a file system I came up with that loaded level files of CD in an amazingly short time (it improved the load time from 10s of seconds to near instantaneous) and it works on non-CD media as well. It consisted of three versions of a class to wrap the file IO functions, all with the same interface:
class IFile
{
public:
IFile (class FileSystem &owner);
virtual Seek (...);
virtual Read (...);
virtual GetFilePosition ();
};
and an additional class:
class FileSystem
{
public:
BeginStreaming (filename);
EndStreaming ();
IFile *CreateFile ();
};
and you'd write the loading code like:
void LoadLevel (levelname)
{
FileSystem fs;
fs.BeginStreaming (levelname);
IFile *file = fs.CreateFile (level_map_name);
ReadLevelMap (fs, file);
delete file;
fs.EndStreaming ();
}
void ReadLevelMap (FileSystem &fs, IFile *file)
{
read some data from fs
get names of other files to load (like textures, object definitions, etc...)
for each texture file
{
IFile *texture_file = fs.CreateFile (some other file name)
CreateTexture (texture_file);
delete texture_file;
}
}
Then, you'd have three modes of operation: debug mode, stream file build mode and release mode.
In each mode, the FileSystem object would create different IFile objects.
In debug mode, the IFile object just wrapped the standard IO functions.
In stream file building, the IFile object also wrapped the standard IO but had the additional functions of writing to the stream file (the owner FileSystem opened the stream file) every byte that was read, and writing the return value of any file pointer position queries (so if anything needed to know a file size, that information is written to the stream file). This would sort of concatenate the various files into one big file, but only the data that was actually read.
The release mode would create an IFile that did not open files or seek within files, it just read from the streaming file (as opened by the owner FileSystem object).
This means that in release mode, all data is read in one sequential series of reads (the OS would buffer it nicely) rather than lots of seeks and reads. This is ideal for CDs where seek times are really slow. Needless to say, this was developed for a CD based console system.
A side effect is that the data is stripped of unnecessary meta data that would normally be skipped.
It does have drawbacks - all the data for a level is in one file. These can get quite large and the data can't be shared between files, if you had a set of textures, say, that were common across two or more levels, the data would be duplicated in each stream file. Also, the load process must be the same every time the data is loaded, you can't conditionally skip or add elements to a level.
As Carmack indicates many games (and other applications) loading code is structured lika a lot of small reads and allocations.
Instead of doing this you do a single fread (or equivalent) of say a level file into memory and just fixup the pointers afterwards.