Is it possible to override std::istream read method? - c++

I want to use std::istream to read data from a given class which only provides 2 methods:
// Returns a byte from the stream (consuming it)
uint8_t getChar(OwnIOStream stream);
// Makes the passed pointer point to the data in the stream
bool getCharBlockPtr(OwnIOStream stream, uint8_t** buffer, uin32_t maxSize, uint32_t* size);
I first thought of inheriting from stream_buf and implement the underflow method using the getChar() method. However I would like to use the getCharBlockPtr() instead to avoid copies of data (I assume calling underflow for each read byte will decrease the performance). The problem is that I need to know in advance the quantity of bytes I want to read each time. This is why I was thinking whether it would be possible to override the read method of istream.
Using boost is not an option.

Deriving from streambuf is a perfectly valid option.
Whether you need to do much (if any) more than override underflow depends. If OwnIOStream is already buffering data, then you might get adequate performance from just doing that. I'd start by doing that, and seeing how things go.
The next obvious step would be to use getCharBlockPtr. A streambuf has a setg that you use to set the pointers to the beginning, current position, and end of the buffer. At least if I'm interpreting things correctly, you'd call getCharBlockPtr and get the beginning and end. You'd then call setg to set the beginning and current positions to the beginning, and the end to the end. From there, the streambuf should be able to read data directly from the buffer. When it runs out, it'll call underflow, and you'll need to get more data again.
One note: it doesn't look like OwnIOStream supports a putback area, which streams normally do expect to support. Getting this to work correctly (especially putting back a character when at the beginning of a buffer) may be somewhat non-trivial to support correctly.

The general rule is that most classes from the standard C++ library are not meant to be derived. The exception here is the streambuf class which is intended to be the low level interface between an IO capable object and a C++ stream.
So the standard way here would be to build a custom streambuf calling getChar or better getCharBlockPtr that would allow buffering. Jerry's answer explains how to do it.
But it looks like you have special requirement here:
you know at a higher level the number of bytes you want to read
you are in a performance critical operation and want to avoid buffer copies if possible - but this last point should be reviewed after profiling...
Then, asking for the full power of a C++ stream may not be the way to go. Building a custom class that would only implement the methods of a istream that you need to call, as well as few extractors would still give a nice code at high level and could give a more optimized code at low level.
But you and only you can know:
whether your higher level code requires a C++ stream - then a custom class is not an option and you will have to use a custom streambuf
whether you could directly give the expected size to the streambuf from the calling code as a hint. More or less:
class OwnStreamBuf: public std::streambuf {
...
void sizehint(size_t hint) { /* size for next read access */
this->hint = hint
}
int_type underflow() {
...
cr = getCharBlockPtr(stream, buffer, hint, size); /* use hint as maxSize */
...
}
}
...
OwnStreamBuf buf(...);
class OwnIStream: public istream ...
OwnIStream is(buf)
...
buf.sizehint(n)
is >> special_obj;
...
whether you only need a handful of extractors and a custom class will be simpler to build from scratch.

Related

How to add functionality to existing C++ standard library functions [duplicate]

I need some guidance or pointers understanding how to implement a custom ostream. My requirements are:
A class with a '<<' operator for several data types.
The intention is to send output to database. Each "line" should go to a separate record.
Each record most important field would be the text (or blob), but some other fields such as time, etc. can be mostly deduced automatically
buffering is important, as I don't want to go to database for every record.
First, does it worth deriving from ostream? What do I get by deriving from ostream? What if my class simply implements few operator<< methods (including some custom data types). Which functionality do I get from ostream?
Assuming what I want is a class derived from ostream, I need some guidance understanding the relationship between the ostream and the streambuf classes. Which one do I need to implement? Looking at some samples, it appears that I don't need to derive from ostream at all, and just give the ostream constructor a custom streambuf. Is that true? is that the canonical approach?
Which virtual functions at the custom streambuf do i need to implement? I've seen some samples (including this site: here and here, and few more), some override the sync method, and other override the overflow method. Which one should I override? Also, looking at the stringbuf and filebuf sources (Visual Studio or GCC) both those buffer classes implement many methods of the streambuf.
If a custom class derived from streambuf is required, would there be any benefit deriving from stringbuf (or any other class) instead of directly from streambuf?
As for "lines". I would like at least when my users of the class using the 'endl' manipulator to be a new line (i.e. record in database). Maybe - depends on effort - every '\n' character should be considered as a new record as well. Who do my custom ostream and/or streambuf get notified for each?
A custom destination for ostream means implementing your own ostreambuf. If you want your streambuf to actually buffer (i.e. don't connect to the database after each character), the easiest way to do that is by creating a class inheriting from std::stringbuf. The only function that you'll need to override is the sync() method, which is being called whenever the stream is flushed.
class MyBuf : public std::stringbuf
{
public:
virtual int sync() {
// add this->str() to database here
// (optionally clear buffer afterwards)
}
};
You can then create a std::ostream using your buffer:
MyBuf buff;
std::ostream stream(&buf)
Most people advised against redirecting the stream to a database, but they ignored my description that the database basically has a single blob field where all text is going to.
In rare cases, I might send data to a different field. This can be facilitated with custom attributes understood by my stream. For example:
MyStream << "Some text " << process_id(1234) << "more text" << std::flush
The code above will create a record in the database with:
blob: 'Some text more text'
process_id: 1234
process_id() is a method returning a structure ProcessID. Then, in the implementation of my ostream, I have an operator<<(ProcessID const& pid), which stores the process ID until it gets written. Works great!
The simplest way is to inherit std::streambuf and override just two methods:
std::streamsize xsputn(const char_type* s, std::streamsize n) – to append a given buffer with size provided to your internal buffer, std::string for example;
int_type overflow(int_type c) – to append a single char to your internal buffer.
Your streambuf can be constructed from whatever you want (DB connection for example). After append something into the internal buffer you may try to split it into lines and push something into DB (or just buffer an SQL statements to execute later).
To use it: just attach your streambuf to any std::ostream using constructor.
Simple! I've done something like this to output strings to syslog – everything works fine with any custom operator<< for user defined classes.
my2c - I think you are tackling this the wrong way. A stream may sound like a nice idea, but you'll need a way to indicate the end of the row too (and then what if someone forgets?) I would suggest something along the lines of how the java PreparedStatements and batches work, as in provide a set of methods which accept the types and a column index, then a "batch" method which explicitly makes it clear that you are indeed batching that row and then an execute to push the batch in.
Any stream based operation will rely on type (typically) to indicate which column to fill - but what if you have two ints? IMO, as a user, it doesn't feel like a natural way of inserting records into a database...
To add a new source or destination of character input/output to the iostreams mechanism, you should create a new streambuf class. The task of the stream buffer classes is to communicate with the 'external device' that will store the characters and to provide buffering facilities.
The problem with using iostreams to communicate with your database is that a database table does not match with the concept of a character sequence. A bit like pushing a round peg in a square hole. A streambuf only operates on characters. That is the only thing ever presented to it. This means the streambuf has to parse the character stream presented to it to find the field and record separators.
If you decide to go this route, I predict you will end up writing a CSV-to-SQL converter in your streambuf, just to get it working.
You will probably be better of with just adding a few operator<< overloads to your class(es). You could look at the Qt framework for ideas here. They also have the possibility to use operator<< to add items to a collections and such.

Puzzled about Boost Bufferstream implementation

I often use the Boost Interprocess library, which provides an I/O stream class, bufferstream, which is basically similar to std::stringstream, except it operates over an arbitrary user-provided memory buffer.
I've used this stream class extensively, and I've also used it's source code to learn a lot about implementing/extending C++ I/O streams in general.
However, one piece of implementation code has struck me as very weird, and I'd like to see if anyone knows exactly why the Boost designer implemented it this way.
Specifically, it's a function in the streambuffer class boost::interprocess::basic_bufferbuf. This class inherits from std::basic_streambuf, and overrides all the virtual protected methods to provide specific behavior.
The source code can be browsed online here.
Specifically, the function basic_bufferbuf::seekpos() is implemented as follows:
virtual pos_type seekpos(pos_type pos, std::ios_base::openmode mode
= std::ios_base::in | std::ios_base::out)
{ return seekoff(pos - pos_type(off_type(0)), std::ios_base::beg, mode); }
Generally, stream buffer implementations which override std::basic_streambuf::seekpos() just call std::basic_streambuf::seekoff with the starting position set to std::ios_base::beg, so that's normal.
But why does the implementor write pos - pos_type(off_type(0)) here? Why subtract zero from the starting position? What possible use is that?
I suspect it must have something to do with unusual off_type or pos_type types, but I can't imagine what. Generally, pos_type and off_type are just signed integral types.
So why subtract zero here?

streambuf get streampos

I use the C++ streambuf class for a compiler project and need a convenient way to get the current position in the stream.
There are two member functions, streambuf::pubseekpos and a streambuf::pubseekoff, to modify the position and I am quite confused about the absence of a streambuf::pubgetpos member function (or something similar) to read it.
There seem to be two possible workarounds:
I could save the current position in a separate variable and modify
it manually whenever I read characters from the stream.
I could call streambuf::pubseekoff(0, ios_base::cur), which returns
the new stream position.
The second option seems usable but inefficient and unaesthetic for such a trivial task. Is there a better way to do it?
The streambuf doesn't have a separate interface for reading the position. However, istream and ostream do (tellg and tellp respectively).
Interestingly, the streams use your option 2 to get their positions, so it is just fine.

C++ Truncating an iostream Class

For a project I'm working on for loading/storing data in files, I made the decision to implement the iostream library, because of some of the features it holds over other file io libraries. One such feature, is the ability to use either the deriving fstream or stringstream classes to allow the loading of data from a file, or an already existent place in memory. Although, so far, there is one major fallback, and I've been having trouble getting information about it for a while.
Similar to the functions available in the POSIX library, I was looking for some implementation of the truncate or ftruncate functions. For stringstream, this would be as easy as passing the associated string back to the stream after reconstructing it with a different size. For fstream, I've yet to find any way of doing this, actually, because I can't even find a way to pull the handle to the file out of this class. While giving me a solution to the fstream problem would be great, an even better solution to this problem would have to be iostream friendly. Because every usage of either stream class goes through an iostream in my program, it would just be easier to truncate them through the base class, than to have to figure out which class is controlling the stream every time I want to change the overall size.
Alternatively, if someone could point me to a library that implements all of these features I'm looking for, and is mildly easy to replace iostream with, that would be a great solution as well.
Edit: For clarification, the iostream classes I'm using are more likely to just be using only the stringstream and fstream classes. Realistically, only the file itself needs to be truncated to a certain point, I was just looking for a simpler way to handle this, that doesn't require me knowing which type of streambuf the stream was attached to. As the answer suggested, I'll look into a way of using ftruncate alongside an fstream, and just handle the two specific cases, as the end user of my program shouldn't see the stream classes anyways.
You can't truncate an iostream in place. You have to copy the first N bytes from the existing stream to a new one. This can be done with the sgetn() and sputn() methods of the underlying streambuf object, which you can obtain by iostream::rdbuf().
However, that process may be I/O intensive. It may be better to use special cases to manipulate the std::string or call ftruncate as you mentioned.
If you want to be really aggressive, you can create a custom std::streambuf derivative class which keeps a pointer to the preexisting rdbuf object in a stream, and only reads up to a certain point before generating an artifiicial end-of-file. This will look to the user like a shorter I/O sequence, but will not actually free memory or filesystem space. (It's not clear if that is important to you.)

Passing raw data in C++

Up until now, whenever I wanted to pass some raw data to a function (like a function that loads an image from a buffer), I would do something like this:
void Image::load(const char* buffer, std::size_t size);
Today I took a look at the Boost libraries, more specifically at the property_tree/xml_parser.hpp header, and I noticed this function signature:
template<typename Ptree>
void read_xml(std::basic_istream<typename Ptree::key_type::value_type>&,
Ptree &, int = 0);
This actually made me curious: is this the correct way to pass around raw data in C++, by using streams? Or am I misinterpreting what the function is supposed to be used for?
If it's the former, could you please point me to some resource where I can learn how to use streams for this? I haven't found much myself (mainly API references), and I have't been able to find the Boost source code for the XML parser either.
Edit: Some extra details
Seems there's been some confusion as to what I want. Given a data buffer, how can I convert it to a stream such that it is compatible with the read_xml function I posted above? Here's my specific use case:
I'm using the SevenZip C library to read an XML file from an archive. The library will provide me with a buffer and its size, and I want to put that in stream format such that it is compatible with read_xml. How can I do that?
Well, streams are quite used in C++ because of their conveniences:
- error handling
- they abstract away the data source, so whether you are reading from a file, an audio source, a camera, they are all treated as input streams
- and probably more advantages I don't know of
Here is an overview of the IOstream library, perhaps that might better help you understand what's going on with streams:
http://www.cplusplus.com/reference/iostream/
Understanding what they are exactly will help you understand how and when to use them.
There's no single correct way to pass around data buffers. A combination of pointer and length is the most basic way; it's C-friendly. Passing a stream might allow for sequential/chunked processing - i. e. not storing the whole file in memory at the same time. If you want to pass a mutable buffer (that might potentially grow), a vector<char>& would be a good choice.
Specifically on Windows, a HGLOBAL or a section object handle might be used.
The C++ philosophy explicitly allows for many different styles, depending on context and environment. Get used to it.
Buffers of raw memory in C++ can either be of type unsigned char*, or you can create a std::vector<unsigned char>. You typically don't want to use just a char* for your buffer since char is not guaranteed by the standard to use all the bits in a single byte (i.e., this will end up varying by platform/compiler). That being said, streams have some excellent uses as well, considering that you can use a stream to read bytes from a file or some other input, etc., and from there, store that data in a buffer.
Seems there's been some confusion as to what I want. Given a data buffer, how can I convert it to a stream such that it is compatible with the read_xml function I posted above?
Easily (I hope PTree::Key_type::value_type would be something like char):
istringstream stream(string(data, len));
read_xml(stream, ...);
More on string streams here.
This is essentially using a reference to pass the stream contents. So behind the scene's it's essentially rather similar to what you did so far and it's essentially the same - just using a different notation. Simplified, the reference just hides the pointer aspect, so in your boost example you're essentially working with a pointer to the stream.
References got the advantage avoiding all the referencing/dereferencing and are therefore easier to handle in most situations. However they don't allow you multiple levels of (de-)referencing.
The following two example functions do essentially the same:
void change_a(int &var, myclass &cls)
{
var = cls.convert();
}
void change_b(int *var, myclass *cls)
{
*var = cls->convert();
}
Talking about the passed data itself: It really depends on what you're trying to achieve and what's more effective. If you'd like to modify a string, utilizing an object of class std::string might be more convenient than using a classic pointer to a buffer (char *). Streams got the advantage that they can represent several different things (e.g. data stream on the network, a compressed stream or simply a file or memory stream). This way you can write single functions or methods that accept a stream as input and will instantly work without worrying about the actual stream source. Doing this with classic buffers can be more complicated. On the other side you shouldn't forget that all objects will add some overhead, so depending on the job to be done a simple pointer to a character string might be perfectly fine (and the most effective solution). There's no "the one way to do it".