C++ Truncating an iostream Class - c++

For a project I'm working on for loading/storing data in files, I made the decision to implement the iostream library, because of some of the features it holds over other file io libraries. One such feature, is the ability to use either the deriving fstream or stringstream classes to allow the loading of data from a file, or an already existent place in memory. Although, so far, there is one major fallback, and I've been having trouble getting information about it for a while.
Similar to the functions available in the POSIX library, I was looking for some implementation of the truncate or ftruncate functions. For stringstream, this would be as easy as passing the associated string back to the stream after reconstructing it with a different size. For fstream, I've yet to find any way of doing this, actually, because I can't even find a way to pull the handle to the file out of this class. While giving me a solution to the fstream problem would be great, an even better solution to this problem would have to be iostream friendly. Because every usage of either stream class goes through an iostream in my program, it would just be easier to truncate them through the base class, than to have to figure out which class is controlling the stream every time I want to change the overall size.
Alternatively, if someone could point me to a library that implements all of these features I'm looking for, and is mildly easy to replace iostream with, that would be a great solution as well.
Edit: For clarification, the iostream classes I'm using are more likely to just be using only the stringstream and fstream classes. Realistically, only the file itself needs to be truncated to a certain point, I was just looking for a simpler way to handle this, that doesn't require me knowing which type of streambuf the stream was attached to. As the answer suggested, I'll look into a way of using ftruncate alongside an fstream, and just handle the two specific cases, as the end user of my program shouldn't see the stream classes anyways.

You can't truncate an iostream in place. You have to copy the first N bytes from the existing stream to a new one. This can be done with the sgetn() and sputn() methods of the underlying streambuf object, which you can obtain by iostream::rdbuf().
However, that process may be I/O intensive. It may be better to use special cases to manipulate the std::string or call ftruncate as you mentioned.
If you want to be really aggressive, you can create a custom std::streambuf derivative class which keeps a pointer to the preexisting rdbuf object in a stream, and only reads up to a certain point before generating an artifiicial end-of-file. This will look to the user like a shorter I/O sequence, but will not actually free memory or filesystem space. (It's not clear if that is important to you.)

Related

inheriting from std::ostream (to avoid rewriting same old content)...?

I would like to have some C++11 output stream which writes to a file only if the newly written content is different from the former one (if the content is the same as the one existing on disk, I don't want to alter the original file's metadata).
FWIW, the program is on GNU/Linux/Debian/Sid x86-64, compiled by a recent GCC 5 (or later). It is a server-like program, and there is only at most one process running it. No other process is supposed to write to that file on the system.
The reason I want to avoid overwriting an existing identical file content is because the actual output is some *.h C++ header file .... (if that matters, it is a new incarnation of a future MELT monitor, I am redesigning & rewriting this in C++11) and I don't want future make builds to recompile stuff depending on it if that generated header file has not changed.
I'm tempted to inherit from std::ofstream (then, the initial file path would be some temporary path name) or std::ostringstream (then, the entire file content is kept in memory, not a big deal for my case), and redefine its close method to compare the new content with the old content on disk, and rewrite the disk file only if that content changes.
But I feel it smells not good (or even wrong), in particular because std::ofstream::close is not documented as virtual. Should I use std::filebuf instead?
I have many existing operator << with the left operand being std::ostream& and the right one being some of my own classes, and I would like to use them on my special streams.
I have already lots of functions able to output to any std::ostream and I would like to use some of them on such a "differential" file stream...
Or should I inherit from std::ostream (hence I slightly changed the title of the question which initially mentioned std::ofstream, not std::ostream)?
I will compare the old and new content at close time (just by closing the temporary file, and reading it and the old file, and comparing them byte by byte).
Don't. A stream is a flow of data, not a file. It's not appropriate to let this functionality go anywhere near streams. If nothing else, although you could probably hack a way to determine early on whether the source and destination were identical, that would break the FIFO model of streams. They're just not the right tool for this job.
Generate your file to a temporary location, then perform an md5sum comparison and move it (via overwriting) to the target path iff the content differs; else simply remove the temporary file. Surely it needn't be any more complicated than that.
My mental model of this is that it's the streambuf that is the internal implementor's interface and the stream is just the wrapper that makes it convenient to use. So to change functionality you normally define things at the streambuf level, where there are lots of virtual functions to override. But I'm far from an expert on this.
You might, with care, be able to use your modified filebuf with a vanilla ofstream.
Although writing a new streambuf might be the way to go, one approach is that your new class is-a ostream that has-a fstream internally. That is, it inherits from the abstract base class and has a private data member representing the underlying file.

Serialization with a custom pattern and random access with Boost

I'm asking here because i have already tried to search but i have no idea if this things even exist and what their names are.
I start explaining that with custom pattern i mean this: suppose that i need to serialize objects or data of type foo, bar and boo, usually the library handle this for the user in a very simple way, what comes first goes in first in the serialization process, so if i serialize all the foo first they are written "at the top" of the file and all the bar and boo are after the foo.
Now I would like to keep order in my file and organize things based on a custom pattern, it's this possible with Boost ? What section provides this feature ?
Second thing, that is strictly related to the first one, I also would like to access my serialized binary files in a way that I'm not forced to parse and read all the previous values to extract only the one that I'm interested in, kinda like the RAM that works based on memory address and offers a random access without forcing you to parse all the others addresses.
Thanks.
On the first issue: the Boost serialization library is agnostic as to what happens after it turns an object into its serialized form. It does this by using input and output streams. Files are just that - fostream/fistream. For other types of streams however, the order/pattern that you speak of doesn't make sense. Imagine you're sending serialized objects over the network - the library can't know that it'll have to rearrange the order of objects and, in fact, it can't do that once they've been sent. For this reason, it does not support what you're looking for.
What you can do is create a wrapper that either just caches serialized versions of the objects and arranges them in memory before you tell it to write them out to a file, or that knows that since you're working with files, it can later tellg to the appropriate place in the file and append (this approach would require you to store the locations of the objects you wrote to the file).
As for the second thing - random access file reading. You will have to know exactly where the object is in memory. If you know that the structure of your file won't change, you can seekg on the file stream before handing it to boost for deserialization. If the file structure will change however, you still need to know the location of objects in the file. If you don't want to parse the file to find it, you'll have to store it somewhere during serialization. For example - you can maintain a sort of registry of objects at the top of the file. You will still have to parse it, but it should be just a simple [Object identifier]-[location in file] sort of thing.

Creating a std::iostream adapter

I'd like to create an iostream adapter class which lets me modify the data written to or read from a stream on-the-fly.
The adapter itself should be a iostream to allow true transparency towards third-party code.
Example for a StreamEncoder class derived from std::ostream:
// External algorithm, creates large amounts of log data
int foo(int bar, std::ostream& logOutput);
int main()
{
// The target file
std::ofstream file("logfile.lzma");
// A StreamEncoder compressing the output via LZMA
StreamEncoder lzmaEncoder(file, &encodeLzma);
// A StreamEncoder converting the UTF-8 log data to UTF-16
StreamEncoder utf16Encoder(lzmaEncoder, &utf8ToUtf16);
// Call foo(), but write the log data to an LZMA-compressed UTF-16 file
cout << foo(42, utf16Encoder);
}
As far as I know, I need to create a new basic_streambuf derivate and embed it in a basic_ostream subclass, but that seems to be pretty complex.
Is there any easier way to accomplish this?
Oddly enough, at least as things are really intended to work, none of this should directly involve iostreams and/or streambufs at all.
I would think of an iostream as a match-maker class. An iostream has a streambuf which provides a buffered interface to some sort of external source/sink of data. It also has a locale, which handles all the formatting. The iostream is little more than the playground supervisor that keeps those two playing together nicely (so to speak). Since you're dealing with data formatting, all of this is (or should be) handled in the locale.
A locale isn't monolithic though -- it's composed of a number of facets, each devoted to one particular part of data formatting. In this case, the part you probably care about is the codecvt facet, which is used (almost exclusively) to translate between the external and internal representations of data being read from/written to iostreams.
For better or worse, however, a locale can only contain one codecvt facet at a time, not a chain of them like you're contemplating. As such, what you really need/want is a wrapper class that provides a codecvt as its external interface, but allows you to chain some arbitrary set of transforms to be done to the data during I/O.
For the utf-to-utf conversion, Boost.locale provides a utf_to_utf function, and codecvt wrapper code, so doing this part of the conversion is simple and straightforward.
Lest anybody suggest that such things be done with ICU, I'll add that Boost.Locale is pretty much a wrapper around ICU, so this is more or less the same answer, but in a form that's much more friendly to C++ (whereas ICU by itself is rather Java-like, and all but overtly hostile to C++).
The other side of things is that writing a codecvt facet adds a great deal of complexity to a fairly simple task. A filtering streambuf (for one example) is generally a lot simpler to write. It's still not as easy as you'd like, but not nearly as bad as a codecvt facet. As #Flexo already mentioned, the Boost iostreams library already includes a filtering streambuf that does zip compression. Doing roughly the same with lzma (or lzh, arithmetic, etc. compression) is relatively easy, at least assuming you have compression functions that are easy to use (you basically just supply them with a buffer of input, and they supply a buffer of results).

Passing raw data in C++

Up until now, whenever I wanted to pass some raw data to a function (like a function that loads an image from a buffer), I would do something like this:
void Image::load(const char* buffer, std::size_t size);
Today I took a look at the Boost libraries, more specifically at the property_tree/xml_parser.hpp header, and I noticed this function signature:
template<typename Ptree>
void read_xml(std::basic_istream<typename Ptree::key_type::value_type>&,
Ptree &, int = 0);
This actually made me curious: is this the correct way to pass around raw data in C++, by using streams? Or am I misinterpreting what the function is supposed to be used for?
If it's the former, could you please point me to some resource where I can learn how to use streams for this? I haven't found much myself (mainly API references), and I have't been able to find the Boost source code for the XML parser either.
Edit: Some extra details
Seems there's been some confusion as to what I want. Given a data buffer, how can I convert it to a stream such that it is compatible with the read_xml function I posted above? Here's my specific use case:
I'm using the SevenZip C library to read an XML file from an archive. The library will provide me with a buffer and its size, and I want to put that in stream format such that it is compatible with read_xml. How can I do that?
Well, streams are quite used in C++ because of their conveniences:
- error handling
- they abstract away the data source, so whether you are reading from a file, an audio source, a camera, they are all treated as input streams
- and probably more advantages I don't know of
Here is an overview of the IOstream library, perhaps that might better help you understand what's going on with streams:
http://www.cplusplus.com/reference/iostream/
Understanding what they are exactly will help you understand how and when to use them.
There's no single correct way to pass around data buffers. A combination of pointer and length is the most basic way; it's C-friendly. Passing a stream might allow for sequential/chunked processing - i. e. not storing the whole file in memory at the same time. If you want to pass a mutable buffer (that might potentially grow), a vector<char>& would be a good choice.
Specifically on Windows, a HGLOBAL or a section object handle might be used.
The C++ philosophy explicitly allows for many different styles, depending on context and environment. Get used to it.
Buffers of raw memory in C++ can either be of type unsigned char*, or you can create a std::vector<unsigned char>. You typically don't want to use just a char* for your buffer since char is not guaranteed by the standard to use all the bits in a single byte (i.e., this will end up varying by platform/compiler). That being said, streams have some excellent uses as well, considering that you can use a stream to read bytes from a file or some other input, etc., and from there, store that data in a buffer.
Seems there's been some confusion as to what I want. Given a data buffer, how can I convert it to a stream such that it is compatible with the read_xml function I posted above?
Easily (I hope PTree::Key_type::value_type would be something like char):
istringstream stream(string(data, len));
read_xml(stream, ...);
More on string streams here.
This is essentially using a reference to pass the stream contents. So behind the scene's it's essentially rather similar to what you did so far and it's essentially the same - just using a different notation. Simplified, the reference just hides the pointer aspect, so in your boost example you're essentially working with a pointer to the stream.
References got the advantage avoiding all the referencing/dereferencing and are therefore easier to handle in most situations. However they don't allow you multiple levels of (de-)referencing.
The following two example functions do essentially the same:
void change_a(int &var, myclass &cls)
{
var = cls.convert();
}
void change_b(int *var, myclass *cls)
{
*var = cls->convert();
}
Talking about the passed data itself: It really depends on what you're trying to achieve and what's more effective. If you'd like to modify a string, utilizing an object of class std::string might be more convenient than using a classic pointer to a buffer (char *). Streams got the advantage that they can represent several different things (e.g. data stream on the network, a compressed stream or simply a file or memory stream). This way you can write single functions or methods that accept a stream as input and will instantly work without worrying about the actual stream source. Doing this with classic buffers can be more complicated. On the other side you shouldn't forget that all objects will add some overhead, so depending on the job to be done a simple pointer to a character string might be perfectly fine (and the most effective solution). There's no "the one way to do it".

Having a class with no data members good option for file manipulation?

I have a file with saved data that sometimes needs to be accessed, written to, erased, etc. when the program is running. I decided to write a SavedDataHandler class to accomplish this. I'm currently using the fstream class.
I considered having one data member be the fstream itself, opening it in the constructor, and closing it in the destructor. However, I realized that different functions called on the SavedDataHandler open the stream differently ( setting different flags, etc. ) so I decided not to go that route.
Instead I just have a static const std::string with the file name, with the public member functions handling the opening and closing of the file as they need to. Performance is not an issue.
Is this route a valid option? Since it has no data members, providing a constructor isn't even necessary. It's just a class that contains functions ( and one static constant ), with the functions operating on a resource rather than a data member.
Hmya, the fstream class is by itself already a capable wrapper class around an operating system handle for a file. If you can't think of a way to add functionality to your own wrapper around fstream, take it as a hint that you don't actually need a wrapper.
Don't wrap (or inherit) just because you can.
Well in some projects, wrapping is essential. Just stop to think if later you'll need for example, to change the file I/O libs (dunno why you'd want to do that, since C++ libs are optimized and ISO). What would you do then? Change all the calls from fstream to YourNewSuperMegaLib::SuperFileSystem::MegaFileStream?
If you want simplicity, I'd just inherit fstream and in the constructor, pass the opening modes you want and invoke the super constructor accodingly.