is it a good thing to use iterators to read on a formatted stream? - c++

I have written a class that acts like an iterator to parse CSV formatted files.
I have also written other classes to read specific csv files to fill directly a MyObject structure. Thus the class can be used like that (I removed the error handling part of the code):
std::ifstream in(filename);
MyObjectParser parser(in);
MyObjectParser::Iterator it;
for (it = parser.begin(); it != parser.end(); it++)
{
MyObject b = *it;
// do some stuff here ...
}
The program works well and I'm happy with it but I realized that the implicit meaning (only for myself?) of an iterator is that it will iterate over a collection. In this case there is no collection but a stream.
Should I prefer a form that explicitly suggest i'm using a stream by overloading >> operator
and thus having something like that :
std::ifstream in(filename);
MyObjectReader reader(in);
MyObject obj;
while(reader >> obj)
{
// do the same "some stuff" here...
}
Is it only a matter of taste?
I don't see clearly what are the differences (except that in the second form the object is just filled and not copied) and what are the consequences of choosing the first or the second form.
I would be happy to get some other opinions in order to know exactly why i'm using a solution rather than another.

You can treat a stream as a collection if you want.
I'd note, however, that by overloading operator>>, you can have both -- you can explicitly read data from the stream using operator>> directly, or you can treat the stream as a collection by using std::istream_iterator<whatever> to treat it as a collection.
That being the case, it seems to me that overloading operator>> is the obvious choice, since then you can treat things either way with essentially no extra work. In addition, using std::istream_iterator<x> is a fairly recognizable idiom, since it's included in the standard library.

The concept of iteration is not dependent on that of containers.
Iterators iterate over a sequence of values. Different iterator
designs define the sequence in different ways, but there is
always the ideas of current value, advance and reaching the end.
About the only problem with input iterators is that they only
terminate at the end of file; you cannot say, for example, that
the next 10 lines contain doubles, and then we go on to
something else. (But of course, you can insert a filtering
streambuf in the stream to detect the end.)

Related

How to add functionality to existing C++ standard library functions [duplicate]

I need some guidance or pointers understanding how to implement a custom ostream. My requirements are:
A class with a '<<' operator for several data types.
The intention is to send output to database. Each "line" should go to a separate record.
Each record most important field would be the text (or blob), but some other fields such as time, etc. can be mostly deduced automatically
buffering is important, as I don't want to go to database for every record.
First, does it worth deriving from ostream? What do I get by deriving from ostream? What if my class simply implements few operator<< methods (including some custom data types). Which functionality do I get from ostream?
Assuming what I want is a class derived from ostream, I need some guidance understanding the relationship between the ostream and the streambuf classes. Which one do I need to implement? Looking at some samples, it appears that I don't need to derive from ostream at all, and just give the ostream constructor a custom streambuf. Is that true? is that the canonical approach?
Which virtual functions at the custom streambuf do i need to implement? I've seen some samples (including this site: here and here, and few more), some override the sync method, and other override the overflow method. Which one should I override? Also, looking at the stringbuf and filebuf sources (Visual Studio or GCC) both those buffer classes implement many methods of the streambuf.
If a custom class derived from streambuf is required, would there be any benefit deriving from stringbuf (or any other class) instead of directly from streambuf?
As for "lines". I would like at least when my users of the class using the 'endl' manipulator to be a new line (i.e. record in database). Maybe - depends on effort - every '\n' character should be considered as a new record as well. Who do my custom ostream and/or streambuf get notified for each?
A custom destination for ostream means implementing your own ostreambuf. If you want your streambuf to actually buffer (i.e. don't connect to the database after each character), the easiest way to do that is by creating a class inheriting from std::stringbuf. The only function that you'll need to override is the sync() method, which is being called whenever the stream is flushed.
class MyBuf : public std::stringbuf
{
public:
virtual int sync() {
// add this->str() to database here
// (optionally clear buffer afterwards)
}
};
You can then create a std::ostream using your buffer:
MyBuf buff;
std::ostream stream(&buf)
Most people advised against redirecting the stream to a database, but they ignored my description that the database basically has a single blob field where all text is going to.
In rare cases, I might send data to a different field. This can be facilitated with custom attributes understood by my stream. For example:
MyStream << "Some text " << process_id(1234) << "more text" << std::flush
The code above will create a record in the database with:
blob: 'Some text more text'
process_id: 1234
process_id() is a method returning a structure ProcessID. Then, in the implementation of my ostream, I have an operator<<(ProcessID const& pid), which stores the process ID until it gets written. Works great!
The simplest way is to inherit std::streambuf and override just two methods:
std::streamsize xsputn(const char_type* s, std::streamsize n) – to append a given buffer with size provided to your internal buffer, std::string for example;
int_type overflow(int_type c) – to append a single char to your internal buffer.
Your streambuf can be constructed from whatever you want (DB connection for example). After append something into the internal buffer you may try to split it into lines and push something into DB (or just buffer an SQL statements to execute later).
To use it: just attach your streambuf to any std::ostream using constructor.
Simple! I've done something like this to output strings to syslog – everything works fine with any custom operator<< for user defined classes.
my2c - I think you are tackling this the wrong way. A stream may sound like a nice idea, but you'll need a way to indicate the end of the row too (and then what if someone forgets?) I would suggest something along the lines of how the java PreparedStatements and batches work, as in provide a set of methods which accept the types and a column index, then a "batch" method which explicitly makes it clear that you are indeed batching that row and then an execute to push the batch in.
Any stream based operation will rely on type (typically) to indicate which column to fill - but what if you have two ints? IMO, as a user, it doesn't feel like a natural way of inserting records into a database...
To add a new source or destination of character input/output to the iostreams mechanism, you should create a new streambuf class. The task of the stream buffer classes is to communicate with the 'external device' that will store the characters and to provide buffering facilities.
The problem with using iostreams to communicate with your database is that a database table does not match with the concept of a character sequence. A bit like pushing a round peg in a square hole. A streambuf only operates on characters. That is the only thing ever presented to it. This means the streambuf has to parse the character stream presented to it to find the field and record separators.
If you decide to go this route, I predict you will end up writing a CSV-to-SQL converter in your streambuf, just to get it working.
You will probably be better of with just adding a few operator<< overloads to your class(es). You could look at the Qt framework for ideas here. They also have the possibility to use operator<< to add items to a collections and such.

overriding `istream operator>>` vs using `sscanf`

Say I wanted to initialise a std::vector of objects e.g.
class Person { int ID; string name; ...}
from a file that contains a line for each object. One route, is to override operator>> and simply std::cin>>temp_person, another - which I used to favour is to use sscanf("%...", &...) a bunch of temporary primitive types and simply .emplace_back(Person(temp_primitives...).
Which way achieves the quickest runtime ignoring memory footprint? Is there any point in mmap()ing the entire file?
Since you are reading from a file, the performance is going to be I/O-bound. Almost no matter what you do in memory, the effect on the overall performance is not going to be detectable.
I would prefer the operator>> route, because this would let me use the input iterator idiom of C++:
std::istream_iterator<Person> eos;
std::istream_iterator<Person> iit(inputFile);
std::copy(iit, eos, std::back_inserter(person_vector));
or even
std::vector<Person> person_vector(
std::istream_iterator<Person>(inputFile)
, std::istream_iterator<Person>()
);

Invoke different functions for different file extensions

Here's the situation:
I have a Graph class written in C++ and I need to build Graph objects from Files. The problem is that Graph are memorized in files in a lot of different ways, so I was thinking about a function that, using the file extension, could invoke the correct procedure for building a Graph in a certain format. How should I proceed? Am I wrong or I can't just overload operator>> in my class?Thanks in advance.
operator>> is (should be) agnostic to any details of the stream from which it is extracting, so using this operator is probably the wrong tact.
The best way to do this would be:
graph_type load_from_file(const std::string& file_path) { //or use something like boost::filesystem::path
std::ofstream file { file_path };
if(endswith(file_path, ".graph") {
return deserialize_from_graph(ofstream);
}
if(endswith(file_path, ".g2") {
return deserialize_from_g2(ofstream);
}
//other formats here
//else throw
}
note, endswith is not from the standard library, boost however has an implementation in it's string algorithms.
How do you determine how the data is memorized. If it is just
the extension, all you need is a map
std::string→pointer_to_function. If the same
extension can have several different representations,
distinguished, for example, by the first couple of bytes in the
file, or information in some common header, you'll have to
differ the final choice until you've read these
bytes—again, a map to the a pointer to function will do
the trick.
Depending on the complexity of the formats to read, you may want
to replace the pointer to a reader function with a pointer to
a factory function, which returns an instance of a reader class,
which derives from an abstract reader.

streambuf get streampos

I use the C++ streambuf class for a compiler project and need a convenient way to get the current position in the stream.
There are two member functions, streambuf::pubseekpos and a streambuf::pubseekoff, to modify the position and I am quite confused about the absence of a streambuf::pubgetpos member function (or something similar) to read it.
There seem to be two possible workarounds:
I could save the current position in a separate variable and modify
it manually whenever I read characters from the stream.
I could call streambuf::pubseekoff(0, ios_base::cur), which returns
the new stream position.
The second option seems usable but inefficient and unaesthetic for such a trivial task. Is there a better way to do it?
The streambuf doesn't have a separate interface for reading the position. However, istream and ostream do (tellg and tellp respectively).
Interestingly, the streams use your option 2 to get their positions, so it is just fine.

C++ Iterate an istream

What is the best way to parse or iterate an istream? I need to create a function that takes an istream, parses it and creates an object so was wondering the easiest way to do this. Even something that could convert it to string would be dandy.
You can use an istream_iterator.
typedef std::istream_iterator<std::string> streamiter;
for (streamiter it = streamiter(some_istream); it != streamiter(); it++) {
// process words
}
This will split the input stream at all whitespaces.
Since C++ doesn't have reflection and persistence built in, you cannot write a function that reads any object and then see what it came up with. You need to know what you are looking for before-hand and read that specifically. (Of course you could always just read tokens and feed those into a parser.)
If you know exactly which type of object to read from the stream, it's often good to give that class a constructor taking a std::istream&. Since usually the class is also where the code to write into the stream is, this puts both of them close together, which is best for maintenance. The parsing code then just creates the object passing the stream to the constructor.
If you don't know which type you will encounter, you will have to write a (probably simple) parsing function. Such formats should start with an identifier that tells which type of objects follow. Your parsing function would first have to read that identifier, and then branch into code that reads the appropriate type from the stream. Since at this point it knows what type of object to read from the stream, reading the actual objects can be implemented in constructors as described above.