inheriting from std::ostream (to avoid rewriting same old content)...? - c++

I would like to have some C++11 output stream which writes to a file only if the newly written content is different from the former one (if the content is the same as the one existing on disk, I don't want to alter the original file's metadata).
FWIW, the program is on GNU/Linux/Debian/Sid x86-64, compiled by a recent GCC 5 (or later). It is a server-like program, and there is only at most one process running it. No other process is supposed to write to that file on the system.
The reason I want to avoid overwriting an existing identical file content is because the actual output is some *.h C++ header file .... (if that matters, it is a new incarnation of a future MELT monitor, I am redesigning & rewriting this in C++11) and I don't want future make builds to recompile stuff depending on it if that generated header file has not changed.
I'm tempted to inherit from std::ofstream (then, the initial file path would be some temporary path name) or std::ostringstream (then, the entire file content is kept in memory, not a big deal for my case), and redefine its close method to compare the new content with the old content on disk, and rewrite the disk file only if that content changes.
But I feel it smells not good (or even wrong), in particular because std::ofstream::close is not documented as virtual. Should I use std::filebuf instead?
I have many existing operator << with the left operand being std::ostream& and the right one being some of my own classes, and I would like to use them on my special streams.
I have already lots of functions able to output to any std::ostream and I would like to use some of them on such a "differential" file stream...
Or should I inherit from std::ostream (hence I slightly changed the title of the question which initially mentioned std::ofstream, not std::ostream)?
I will compare the old and new content at close time (just by closing the temporary file, and reading it and the old file, and comparing them byte by byte).

Don't. A stream is a flow of data, not a file. It's not appropriate to let this functionality go anywhere near streams. If nothing else, although you could probably hack a way to determine early on whether the source and destination were identical, that would break the FIFO model of streams. They're just not the right tool for this job.
Generate your file to a temporary location, then perform an md5sum comparison and move it (via overwriting) to the target path iff the content differs; else simply remove the temporary file. Surely it needn't be any more complicated than that.

My mental model of this is that it's the streambuf that is the internal implementor's interface and the stream is just the wrapper that makes it convenient to use. So to change functionality you normally define things at the streambuf level, where there are lots of virtual functions to override. But I'm far from an expert on this.
You might, with care, be able to use your modified filebuf with a vanilla ofstream.

Although writing a new streambuf might be the way to go, one approach is that your new class is-a ostream that has-a fstream internally. That is, it inherits from the abstract base class and has a private data member representing the underlying file.

Related

How to add functionality to existing C++ standard library functions [duplicate]

I need some guidance or pointers understanding how to implement a custom ostream. My requirements are:
A class with a '<<' operator for several data types.
The intention is to send output to database. Each "line" should go to a separate record.
Each record most important field would be the text (or blob), but some other fields such as time, etc. can be mostly deduced automatically
buffering is important, as I don't want to go to database for every record.
First, does it worth deriving from ostream? What do I get by deriving from ostream? What if my class simply implements few operator<< methods (including some custom data types). Which functionality do I get from ostream?
Assuming what I want is a class derived from ostream, I need some guidance understanding the relationship between the ostream and the streambuf classes. Which one do I need to implement? Looking at some samples, it appears that I don't need to derive from ostream at all, and just give the ostream constructor a custom streambuf. Is that true? is that the canonical approach?
Which virtual functions at the custom streambuf do i need to implement? I've seen some samples (including this site: here and here, and few more), some override the sync method, and other override the overflow method. Which one should I override? Also, looking at the stringbuf and filebuf sources (Visual Studio or GCC) both those buffer classes implement many methods of the streambuf.
If a custom class derived from streambuf is required, would there be any benefit deriving from stringbuf (or any other class) instead of directly from streambuf?
As for "lines". I would like at least when my users of the class using the 'endl' manipulator to be a new line (i.e. record in database). Maybe - depends on effort - every '\n' character should be considered as a new record as well. Who do my custom ostream and/or streambuf get notified for each?
A custom destination for ostream means implementing your own ostreambuf. If you want your streambuf to actually buffer (i.e. don't connect to the database after each character), the easiest way to do that is by creating a class inheriting from std::stringbuf. The only function that you'll need to override is the sync() method, which is being called whenever the stream is flushed.
class MyBuf : public std::stringbuf
{
public:
virtual int sync() {
// add this->str() to database here
// (optionally clear buffer afterwards)
}
};
You can then create a std::ostream using your buffer:
MyBuf buff;
std::ostream stream(&buf)
Most people advised against redirecting the stream to a database, but they ignored my description that the database basically has a single blob field where all text is going to.
In rare cases, I might send data to a different field. This can be facilitated with custom attributes understood by my stream. For example:
MyStream << "Some text " << process_id(1234) << "more text" << std::flush
The code above will create a record in the database with:
blob: 'Some text more text'
process_id: 1234
process_id() is a method returning a structure ProcessID. Then, in the implementation of my ostream, I have an operator<<(ProcessID const& pid), which stores the process ID until it gets written. Works great!
The simplest way is to inherit std::streambuf and override just two methods:
std::streamsize xsputn(const char_type* s, std::streamsize n) – to append a given buffer with size provided to your internal buffer, std::string for example;
int_type overflow(int_type c) – to append a single char to your internal buffer.
Your streambuf can be constructed from whatever you want (DB connection for example). After append something into the internal buffer you may try to split it into lines and push something into DB (or just buffer an SQL statements to execute later).
To use it: just attach your streambuf to any std::ostream using constructor.
Simple! I've done something like this to output strings to syslog – everything works fine with any custom operator<< for user defined classes.
my2c - I think you are tackling this the wrong way. A stream may sound like a nice idea, but you'll need a way to indicate the end of the row too (and then what if someone forgets?) I would suggest something along the lines of how the java PreparedStatements and batches work, as in provide a set of methods which accept the types and a column index, then a "batch" method which explicitly makes it clear that you are indeed batching that row and then an execute to push the batch in.
Any stream based operation will rely on type (typically) to indicate which column to fill - but what if you have two ints? IMO, as a user, it doesn't feel like a natural way of inserting records into a database...
To add a new source or destination of character input/output to the iostreams mechanism, you should create a new streambuf class. The task of the stream buffer classes is to communicate with the 'external device' that will store the characters and to provide buffering facilities.
The problem with using iostreams to communicate with your database is that a database table does not match with the concept of a character sequence. A bit like pushing a round peg in a square hole. A streambuf only operates on characters. That is the only thing ever presented to it. This means the streambuf has to parse the character stream presented to it to find the field and record separators.
If you decide to go this route, I predict you will end up writing a CSV-to-SQL converter in your streambuf, just to get it working.
You will probably be better of with just adding a few operator<< overloads to your class(es). You could look at the Qt framework for ideas here. They also have the possibility to use operator<< to add items to a collections and such.

Kotlin - Function that Writes to a Stream of Unspecified Type

in C++, there is a concept of an ostream. cout is an ostream, as is an ofstream. In this way you can define a function that takes an ostream and writes to it so that the caller can decide where he wants the function to write to.
Is it possible to achieve the same effect in kotlin: defining a function that determines where (possibly the console) it writes to at runtime? Obviously if or when statements don't count.
Kotlin (and Java) have two equivalents: OutputStream for byte streams, and Writer for character streams. These are both abstract classes, with many concrete subclasses writing to different places.
To take the simpler case:
If you want to write byte data to a file, you can create a FileOutputStream instance which writes to a given filename.  (That implements OutputStream.)
Or if you want to write to stdout, you can use System.out directly.  (That is also an OutputStream.)
Or if you have a network Socket, you can call its getOutputStream() method, which gives you an OutputStream.
Or there are implementations that can write to a byte array, or a pipe, or a CORBA stream, or…
So if you have some code that uses an OutputStream, you could provide it with an instance of any of those classes, and when that calls OutputStream.write() it will write to the appropriate place.
(In practice, you often wrap it in a BufferedOutputStream for efficiency.)
It's very similar for a Writer, too, but in those cases you generally have to tell it which character encoding to use.  (Though in many cases you can leave it to the platform default, which is usually UTF-8.)
So if you want to write character data to a file, you can create a FileWriter instance which writes to a given filename.
Or if you want to write to stdout you can create an OutputStreamWriter around System.out.
And so on.
Again, if your code is written to accept any Writer, then it will work regardless of where it writes to, and there's a BufferedWriter wrapper for efficiency.
There are equivalent classes for input, too: InputStream for byte streams, and Reader for character streams, both with lots of implementations for different sources.
There are few steps in java and kotlin :
Get the InputStream or OutputStream, either from file or network or any other resource that can provide you these Streams. (usages and names of Streams vary depending upon abstractions they provide)
You can use these Streams to read or write to them which eventually reads from or writes to the underlying Resource.(eg. a file)
At the end of operations you need to close the respective Streams using close() method.
Then you may or may not close the Resource depending on your use case.
I would recommend that you follow the Kotlin documentation.
Must see the usage of "use". ( It is like Java's try-with-resource)

Serialization with a custom pattern and random access with Boost

I'm asking here because i have already tried to search but i have no idea if this things even exist and what their names are.
I start explaining that with custom pattern i mean this: suppose that i need to serialize objects or data of type foo, bar and boo, usually the library handle this for the user in a very simple way, what comes first goes in first in the serialization process, so if i serialize all the foo first they are written "at the top" of the file and all the bar and boo are after the foo.
Now I would like to keep order in my file and organize things based on a custom pattern, it's this possible with Boost ? What section provides this feature ?
Second thing, that is strictly related to the first one, I also would like to access my serialized binary files in a way that I'm not forced to parse and read all the previous values to extract only the one that I'm interested in, kinda like the RAM that works based on memory address and offers a random access without forcing you to parse all the others addresses.
Thanks.
On the first issue: the Boost serialization library is agnostic as to what happens after it turns an object into its serialized form. It does this by using input and output streams. Files are just that - fostream/fistream. For other types of streams however, the order/pattern that you speak of doesn't make sense. Imagine you're sending serialized objects over the network - the library can't know that it'll have to rearrange the order of objects and, in fact, it can't do that once they've been sent. For this reason, it does not support what you're looking for.
What you can do is create a wrapper that either just caches serialized versions of the objects and arranges them in memory before you tell it to write them out to a file, or that knows that since you're working with files, it can later tellg to the appropriate place in the file and append (this approach would require you to store the locations of the objects you wrote to the file).
As for the second thing - random access file reading. You will have to know exactly where the object is in memory. If you know that the structure of your file won't change, you can seekg on the file stream before handing it to boost for deserialization. If the file structure will change however, you still need to know the location of objects in the file. If you don't want to parse the file to find it, you'll have to store it somewhere during serialization. For example - you can maintain a sort of registry of objects at the top of the file. You will still have to parse it, but it should be just a simple [Object identifier]-[location in file] sort of thing.

C++ Truncating an iostream Class

For a project I'm working on for loading/storing data in files, I made the decision to implement the iostream library, because of some of the features it holds over other file io libraries. One such feature, is the ability to use either the deriving fstream or stringstream classes to allow the loading of data from a file, or an already existent place in memory. Although, so far, there is one major fallback, and I've been having trouble getting information about it for a while.
Similar to the functions available in the POSIX library, I was looking for some implementation of the truncate or ftruncate functions. For stringstream, this would be as easy as passing the associated string back to the stream after reconstructing it with a different size. For fstream, I've yet to find any way of doing this, actually, because I can't even find a way to pull the handle to the file out of this class. While giving me a solution to the fstream problem would be great, an even better solution to this problem would have to be iostream friendly. Because every usage of either stream class goes through an iostream in my program, it would just be easier to truncate them through the base class, than to have to figure out which class is controlling the stream every time I want to change the overall size.
Alternatively, if someone could point me to a library that implements all of these features I'm looking for, and is mildly easy to replace iostream with, that would be a great solution as well.
Edit: For clarification, the iostream classes I'm using are more likely to just be using only the stringstream and fstream classes. Realistically, only the file itself needs to be truncated to a certain point, I was just looking for a simpler way to handle this, that doesn't require me knowing which type of streambuf the stream was attached to. As the answer suggested, I'll look into a way of using ftruncate alongside an fstream, and just handle the two specific cases, as the end user of my program shouldn't see the stream classes anyways.
You can't truncate an iostream in place. You have to copy the first N bytes from the existing stream to a new one. This can be done with the sgetn() and sputn() methods of the underlying streambuf object, which you can obtain by iostream::rdbuf().
However, that process may be I/O intensive. It may be better to use special cases to manipulate the std::string or call ftruncate as you mentioned.
If you want to be really aggressive, you can create a custom std::streambuf derivative class which keeps a pointer to the preexisting rdbuf object in a stream, and only reads up to a certain point before generating an artifiicial end-of-file. This will look to the user like a shorter I/O sequence, but will not actually free memory or filesystem space. (It's not clear if that is important to you.)

Having a class with no data members good option for file manipulation?

I have a file with saved data that sometimes needs to be accessed, written to, erased, etc. when the program is running. I decided to write a SavedDataHandler class to accomplish this. I'm currently using the fstream class.
I considered having one data member be the fstream itself, opening it in the constructor, and closing it in the destructor. However, I realized that different functions called on the SavedDataHandler open the stream differently ( setting different flags, etc. ) so I decided not to go that route.
Instead I just have a static const std::string with the file name, with the public member functions handling the opening and closing of the file as they need to. Performance is not an issue.
Is this route a valid option? Since it has no data members, providing a constructor isn't even necessary. It's just a class that contains functions ( and one static constant ), with the functions operating on a resource rather than a data member.
Hmya, the fstream class is by itself already a capable wrapper class around an operating system handle for a file. If you can't think of a way to add functionality to your own wrapper around fstream, take it as a hint that you don't actually need a wrapper.
Don't wrap (or inherit) just because you can.
Well in some projects, wrapping is essential. Just stop to think if later you'll need for example, to change the file I/O libs (dunno why you'd want to do that, since C++ libs are optimized and ISO). What would you do then? Change all the calls from fstream to YourNewSuperMegaLib::SuperFileSystem::MegaFileStream?
If you want simplicity, I'd just inherit fstream and in the constructor, pass the opening modes you want and invoke the super constructor accodingly.