Creating a std::iostream adapter - c++

I'd like to create an iostream adapter class which lets me modify the data written to or read from a stream on-the-fly.
The adapter itself should be a iostream to allow true transparency towards third-party code.
Example for a StreamEncoder class derived from std::ostream:
// External algorithm, creates large amounts of log data
int foo(int bar, std::ostream& logOutput);
int main()
{
// The target file
std::ofstream file("logfile.lzma");
// A StreamEncoder compressing the output via LZMA
StreamEncoder lzmaEncoder(file, &encodeLzma);
// A StreamEncoder converting the UTF-8 log data to UTF-16
StreamEncoder utf16Encoder(lzmaEncoder, &utf8ToUtf16);
// Call foo(), but write the log data to an LZMA-compressed UTF-16 file
cout << foo(42, utf16Encoder);
}
As far as I know, I need to create a new basic_streambuf derivate and embed it in a basic_ostream subclass, but that seems to be pretty complex.
Is there any easier way to accomplish this?

Oddly enough, at least as things are really intended to work, none of this should directly involve iostreams and/or streambufs at all.
I would think of an iostream as a match-maker class. An iostream has a streambuf which provides a buffered interface to some sort of external source/sink of data. It also has a locale, which handles all the formatting. The iostream is little more than the playground supervisor that keeps those two playing together nicely (so to speak). Since you're dealing with data formatting, all of this is (or should be) handled in the locale.
A locale isn't monolithic though -- it's composed of a number of facets, each devoted to one particular part of data formatting. In this case, the part you probably care about is the codecvt facet, which is used (almost exclusively) to translate between the external and internal representations of data being read from/written to iostreams.
For better or worse, however, a locale can only contain one codecvt facet at a time, not a chain of them like you're contemplating. As such, what you really need/want is a wrapper class that provides a codecvt as its external interface, but allows you to chain some arbitrary set of transforms to be done to the data during I/O.
For the utf-to-utf conversion, Boost.locale provides a utf_to_utf function, and codecvt wrapper code, so doing this part of the conversion is simple and straightforward.
Lest anybody suggest that such things be done with ICU, I'll add that Boost.Locale is pretty much a wrapper around ICU, so this is more or less the same answer, but in a form that's much more friendly to C++ (whereas ICU by itself is rather Java-like, and all but overtly hostile to C++).
The other side of things is that writing a codecvt facet adds a great deal of complexity to a fairly simple task. A filtering streambuf (for one example) is generally a lot simpler to write. It's still not as easy as you'd like, but not nearly as bad as a codecvt facet. As #Flexo already mentioned, the Boost iostreams library already includes a filtering streambuf that does zip compression. Doing roughly the same with lzma (or lzh, arithmetic, etc. compression) is relatively easy, at least assuming you have compression functions that are easy to use (you basically just supply them with a buffer of input, and they supply a buffer of results).

Related

C++ serialization of data-structures

I'm studying serializations in C++. What's the advantage/difference of boost::serialization if compared to something like:
ifstream_obj.read(reinterpret_cast<char *>(&obj), sizeof(obj)); // read
// or
ofstream_obj.write(reinterpret_cast<char *>(&obj), sizeof(obj)); // write
// ?
and, which one is better to use?
The big advantages of Boost Serialization are:
it actually works for non-trivial (POD) data types (C++ is not C)
it allows you to decouple serialization code from archive backend, thereby giving you text, xml, binary serialization
If you use the proper archive you can even have portability (try that with your sample). This means you can send on one machine/OS/version and receive on another without problems.
Lastly, it adds (a) layer(s) of abstraction which make things a lot less error prone. Granted, you could have done the same for your suggested serialization approach without much issue.
Here's an answer that does the kind of serialization you suggest but safely:
How to pass class template argument to boost::variant?
Note that Boost Serialization is fully aware of bitwise serializable types and you can tell it about your own too:
Boost serialization bitwise serializability

C++ Truncating an iostream Class

For a project I'm working on for loading/storing data in files, I made the decision to implement the iostream library, because of some of the features it holds over other file io libraries. One such feature, is the ability to use either the deriving fstream or stringstream classes to allow the loading of data from a file, or an already existent place in memory. Although, so far, there is one major fallback, and I've been having trouble getting information about it for a while.
Similar to the functions available in the POSIX library, I was looking for some implementation of the truncate or ftruncate functions. For stringstream, this would be as easy as passing the associated string back to the stream after reconstructing it with a different size. For fstream, I've yet to find any way of doing this, actually, because I can't even find a way to pull the handle to the file out of this class. While giving me a solution to the fstream problem would be great, an even better solution to this problem would have to be iostream friendly. Because every usage of either stream class goes through an iostream in my program, it would just be easier to truncate them through the base class, than to have to figure out which class is controlling the stream every time I want to change the overall size.
Alternatively, if someone could point me to a library that implements all of these features I'm looking for, and is mildly easy to replace iostream with, that would be a great solution as well.
Edit: For clarification, the iostream classes I'm using are more likely to just be using only the stringstream and fstream classes. Realistically, only the file itself needs to be truncated to a certain point, I was just looking for a simpler way to handle this, that doesn't require me knowing which type of streambuf the stream was attached to. As the answer suggested, I'll look into a way of using ftruncate alongside an fstream, and just handle the two specific cases, as the end user of my program shouldn't see the stream classes anyways.
You can't truncate an iostream in place. You have to copy the first N bytes from the existing stream to a new one. This can be done with the sgetn() and sputn() methods of the underlying streambuf object, which you can obtain by iostream::rdbuf().
However, that process may be I/O intensive. It may be better to use special cases to manipulate the std::string or call ftruncate as you mentioned.
If you want to be really aggressive, you can create a custom std::streambuf derivative class which keeps a pointer to the preexisting rdbuf object in a stream, and only reads up to a certain point before generating an artifiicial end-of-file. This will look to the user like a shorter I/O sequence, but will not actually free memory or filesystem space. (It's not clear if that is important to you.)

Passing raw data in C++

Up until now, whenever I wanted to pass some raw data to a function (like a function that loads an image from a buffer), I would do something like this:
void Image::load(const char* buffer, std::size_t size);
Today I took a look at the Boost libraries, more specifically at the property_tree/xml_parser.hpp header, and I noticed this function signature:
template<typename Ptree>
void read_xml(std::basic_istream<typename Ptree::key_type::value_type>&,
Ptree &, int = 0);
This actually made me curious: is this the correct way to pass around raw data in C++, by using streams? Or am I misinterpreting what the function is supposed to be used for?
If it's the former, could you please point me to some resource where I can learn how to use streams for this? I haven't found much myself (mainly API references), and I have't been able to find the Boost source code for the XML parser either.
Edit: Some extra details
Seems there's been some confusion as to what I want. Given a data buffer, how can I convert it to a stream such that it is compatible with the read_xml function I posted above? Here's my specific use case:
I'm using the SevenZip C library to read an XML file from an archive. The library will provide me with a buffer and its size, and I want to put that in stream format such that it is compatible with read_xml. How can I do that?
Well, streams are quite used in C++ because of their conveniences:
- error handling
- they abstract away the data source, so whether you are reading from a file, an audio source, a camera, they are all treated as input streams
- and probably more advantages I don't know of
Here is an overview of the IOstream library, perhaps that might better help you understand what's going on with streams:
http://www.cplusplus.com/reference/iostream/
Understanding what they are exactly will help you understand how and when to use them.
There's no single correct way to pass around data buffers. A combination of pointer and length is the most basic way; it's C-friendly. Passing a stream might allow for sequential/chunked processing - i. e. not storing the whole file in memory at the same time. If you want to pass a mutable buffer (that might potentially grow), a vector<char>& would be a good choice.
Specifically on Windows, a HGLOBAL or a section object handle might be used.
The C++ philosophy explicitly allows for many different styles, depending on context and environment. Get used to it.
Buffers of raw memory in C++ can either be of type unsigned char*, or you can create a std::vector<unsigned char>. You typically don't want to use just a char* for your buffer since char is not guaranteed by the standard to use all the bits in a single byte (i.e., this will end up varying by platform/compiler). That being said, streams have some excellent uses as well, considering that you can use a stream to read bytes from a file or some other input, etc., and from there, store that data in a buffer.
Seems there's been some confusion as to what I want. Given a data buffer, how can I convert it to a stream such that it is compatible with the read_xml function I posted above?
Easily (I hope PTree::Key_type::value_type would be something like char):
istringstream stream(string(data, len));
read_xml(stream, ...);
More on string streams here.
This is essentially using a reference to pass the stream contents. So behind the scene's it's essentially rather similar to what you did so far and it's essentially the same - just using a different notation. Simplified, the reference just hides the pointer aspect, so in your boost example you're essentially working with a pointer to the stream.
References got the advantage avoiding all the referencing/dereferencing and are therefore easier to handle in most situations. However they don't allow you multiple levels of (de-)referencing.
The following two example functions do essentially the same:
void change_a(int &var, myclass &cls)
{
var = cls.convert();
}
void change_b(int *var, myclass *cls)
{
*var = cls->convert();
}
Talking about the passed data itself: It really depends on what you're trying to achieve and what's more effective. If you'd like to modify a string, utilizing an object of class std::string might be more convenient than using a classic pointer to a buffer (char *). Streams got the advantage that they can represent several different things (e.g. data stream on the network, a compressed stream or simply a file or memory stream). This way you can write single functions or methods that accept a stream as input and will instantly work without worrying about the actual stream source. Doing this with classic buffers can be more complicated. On the other side you shouldn't forget that all objects will add some overhead, so depending on the job to be done a simple pointer to a character string might be perfectly fine (and the most effective solution). There's no "the one way to do it".

How can I override an C++ standard-library class function?

How can I override a C++ standard-library class function? In my application, I use ofstream objects in many different places of code. And now I want to open files in a different permission mode in Linux, Ubuntu. But open function of ofstream has no parameter to specify the permission mode of the file it creats.
Now I want to override open() function of ofstream class so it will get a parameter to specify the permissions for user access.
For starters, to clarify your terminology, the STL usually refers to the subset of the C++ standard library containing the containers, iterators, and algorithms. The streams classes are part of the C++ standard library, but are usually not bundled together with the STL. Some purists will insist that there is no such thing as the STL in the C++ standard library (since the STL is, technically speaking, a third-party library that was adopted into the standard), but most C++ programmers will know what you mean.
As for your question, there is no way within the standard to specify permission modes with ofstream. If you want to create your own custom stream class that is like ofstream but which supports permissions, your best bet would be to do the following:
Create a subclass of basic_streambuf that allows you to open, write, and possibly read files while specifying Unix permissions. The streams classes are designed so that the details of communicating with physical devices like disk, networks, and the console are all handled by the basic_streambuf class and its derived classes. If you want to make your own stream class, implementing a stream buffer would be an excellent first step.
Define your own class that subclasses basic_ostream and installs your custom basic_streambuf. By default, the basic_ostream supports all of the standard output routines by implementing them in terms of the underlying basic_streambuf object. Once you have your own stream buffer, building a basic_ostream class that uses that streambuf will cause all of the standard stream operations on that class (such as <<) to start making the appropriate calls to your streambuf.
If you'd like more details on this, an excellent reference is Standard C++ IOStreams and Locales. As a shameless plug, I have used the techniques from this book to build a stream class that wraps a socket connection. While a lot of the code in my stream won't be particularly useful, the basic structure should help you get started.
Hope this helps!
This is not answering your question directly as I wouldn't advise overriding ofstream::open.
Instead couldn't you use the first suggestion in this post? Open the file as you normally would to get the correct permissions, and then construct an ofstream from the file descriptor.
#include <iostream>
#include <fstream>
class gstream: public std::ofstream
{
void open(const std::string& filename, ios_base::openmode mode,int stuff)
{
//put stuff here
}
};
int main() {
gstream test;
//io stuff
return 0;
}
seems to work here.
Another option would be to create a wrapper class that contains an 'ofstream' object and has the interface you want, and passes the work onto its 'oftstream' member. It would look like this.

Designing a string class in C++

I need to design (and code) a "customized" string class in C++. I am seeking any documentation and pointers on design issues or potential pitfalls I should be aware of.
Links are very welcome, as are the identification of problems (if any) with current string libs (Qstring, std::string, and the others).
Despite the critics, I think this is a valid question.
The std::string is not a panacea. It looks like someone took the class from a pure-OO and dumped it in C++, which is probably the case.
Advice 1: Prefer non-member non-friend methods
Now that this is said, in this hour of internationalization, I would certainly advise you to design a class that would support Unicode. And I do say Unicode, not UTF-8 or UTF-16. It's ill-fitting (I think) to devise a class that would contain the data in a given encoding. You can provide methods to then output the information in various formats.
Advice 2: Support Unicode
Then, there is a number of points on the memory allocation schemes:
Small String Optimization: the class contains pre-allocated space for a few characters (a dozen or two), and thus avoid heap allocation for those
Copy On Write: the various strings share a buffer so that copy is cheap, when one string needs to modify its content, it copies the buffer if it's not the sole owner --> the issue is that multithreading introduces overhead here and it's been showed that for a general purpose technic this overhead could dwarf the actual copying cost
Immutability: "new" languages such as Java, C# or Python use immutable strings. Think of it as a pool of strings, all strings containing "Fooo" will point to the same buffer. Note that these languages support garbage collection, which rather helps here.
I would personally pick the "Small String Optimization" here (though it's not exclusive with the other two), simply because it's simple to implement and should actually benefit you (heap allocation cost, locality of reference issues).
The other two technics are somewhat complex in the face of multi-threading, and such are likely error-prone and unlikely to yield any real benefit unless carefully crafted.
And that brings my last advice:
Advice 3: Don't implement internal locking in an attempt of MultiThreading support
It will slow down the class when used in SingleThreaded context and will not yield as much benefit as you'd think when used in a MultiThreaded one.
Finally, you could perhaps find something suiting your tastes (or get some pointers) by browsing existing code. I don't promise to exhibit "smooth" interfaces though:
ICU UnicodeString: Unicode support, at least
std::string: over 100 member methods (counting the various overloads)
llvm StringRef: note how many algorithms are implemented as member methods :'(
Effective STL by Scott Meyers has some interesting discussion about possible std::string implementation techniques, though it covers rather advanced issues such as copy-on-write and reference counting.
Depending on what the "customization" is (e.g. a custom allocator), you may be able to do it via a template parameter of the std::basic_string class.
Herb Sutter gives a sample of a custom string class in the GotW #29. You could use it for the start.
From a general-purpose point of view a "new" string class ideally combined the good points of std::string, CString, QString and others. A few points in random order:
MFC CString supports using it in printf-like functions due to a very specific implementation. If you need or want this feature I recommend buying the book "MFC Internals" by George Sheperd. Although the book is from 1996(!) it's description of how CString is implemented should be worth it. http://www.amazon.com/MFC-Internals-Microsoft-Foundation-Architecture/dp/0201407213/ref=sr_1_1?ie=UTF8&s=books&qid=1283176951&sr=8-1
Check that your string class plays nicely with all interfaces you'll use it with (iostreams, Windows API, printf*, etc.)
Don't aim for full unicode support (as in: collation, grapheme clusters, ...) as that will mean your class will never be done, but consider making it a wchar_t class with conversion options.
Consider making the ctor/function that creates your string objects from char* always take the specific encoding of the character arrays. (Can be helpful in mixed UTF-8 / other character sets environments.)
Look at the full CString interface and at the full std:string interface and decide what you are going to need and what you can skip.
Look at QString to see what the other two miss.
Do not provide implicit conversion to neither char/wchar_t*
Consider adding convenient conversion functions to/from numeric types.
Don't write a string class without a full set of detailed Unit Tests!
The world doesn't need another string class. Is this homework? If not, use std::string.
The problem with std::string is.. that you can't change it. Sometimes you need the basics of a std::string, but disagree with the implementation of your c++ library.
As an example, thread-safe reference counting employed means lots of locking (or at least locked operations). Also, if most of your strings are short (because you know this will be the case), you might want a string class that is optimized for that use-case.
So even if you like the std::string API, or at least have learned to live with it, there is room for 'competing implementations' that are more or less workalikes.
PowerDNS would love to have one, as we currently pass many dns host names around, and a large majority of them would fit in a, say, 25 bytes fixed buffer, which would relieve a lot of new/delete pressure.