What is the best way to parse or iterate an istream? I need to create a function that takes an istream, parses it and creates an object so was wondering the easiest way to do this. Even something that could convert it to string would be dandy.
You can use an istream_iterator.
typedef std::istream_iterator<std::string> streamiter;
for (streamiter it = streamiter(some_istream); it != streamiter(); it++) {
// process words
}
This will split the input stream at all whitespaces.
Since C++ doesn't have reflection and persistence built in, you cannot write a function that reads any object and then see what it came up with. You need to know what you are looking for before-hand and read that specifically. (Of course you could always just read tokens and feed those into a parser.)
If you know exactly which type of object to read from the stream, it's often good to give that class a constructor taking a std::istream&. Since usually the class is also where the code to write into the stream is, this puts both of them close together, which is best for maintenance. The parsing code then just creates the object passing the stream to the constructor.
If you don't know which type you will encounter, you will have to write a (probably simple) parsing function. Such formats should start with an identifier that tells which type of objects follow. Your parsing function would first have to read that identifier, and then branch into code that reads the appropriate type from the stream. Since at this point it knows what type of object to read from the stream, reading the actual objects can be implemented in constructors as described above.
Related
I use the C++ streambuf class for a compiler project and need a convenient way to get the current position in the stream.
There are two member functions, streambuf::pubseekpos and a streambuf::pubseekoff, to modify the position and I am quite confused about the absence of a streambuf::pubgetpos member function (or something similar) to read it.
There seem to be two possible workarounds:
I could save the current position in a separate variable and modify
it manually whenever I read characters from the stream.
I could call streambuf::pubseekoff(0, ios_base::cur), which returns
the new stream position.
The second option seems usable but inefficient and unaesthetic for such a trivial task. Is there a better way to do it?
The streambuf doesn't have a separate interface for reading the position. However, istream and ostream do (tellg and tellp respectively).
Interestingly, the streams use your option 2 to get their positions, so it is just fine.
I have written a class that acts like an iterator to parse CSV formatted files.
I have also written other classes to read specific csv files to fill directly a MyObject structure. Thus the class can be used like that (I removed the error handling part of the code):
std::ifstream in(filename);
MyObjectParser parser(in);
MyObjectParser::Iterator it;
for (it = parser.begin(); it != parser.end(); it++)
{
MyObject b = *it;
// do some stuff here ...
}
The program works well and I'm happy with it but I realized that the implicit meaning (only for myself?) of an iterator is that it will iterate over a collection. In this case there is no collection but a stream.
Should I prefer a form that explicitly suggest i'm using a stream by overloading >> operator
and thus having something like that :
std::ifstream in(filename);
MyObjectReader reader(in);
MyObject obj;
while(reader >> obj)
{
// do the same "some stuff" here...
}
Is it only a matter of taste?
I don't see clearly what are the differences (except that in the second form the object is just filled and not copied) and what are the consequences of choosing the first or the second form.
I would be happy to get some other opinions in order to know exactly why i'm using a solution rather than another.
You can treat a stream as a collection if you want.
I'd note, however, that by overloading operator>>, you can have both -- you can explicitly read data from the stream using operator>> directly, or you can treat the stream as a collection by using std::istream_iterator<whatever> to treat it as a collection.
That being the case, it seems to me that overloading operator>> is the obvious choice, since then you can treat things either way with essentially no extra work. In addition, using std::istream_iterator<x> is a fairly recognizable idiom, since it's included in the standard library.
The concept of iteration is not dependent on that of containers.
Iterators iterate over a sequence of values. Different iterator
designs define the sequence in different ways, but there is
always the ideas of current value, advance and reaching the end.
About the only problem with input iterators is that they only
terminate at the end of file; you cannot say, for example, that
the next 10 lines contain doubles, and then we go on to
something else. (But of course, you can insert a filtering
streambuf in the stream to detect the end.)
So, I've been trying to be more rigorous with making any passed parameters that shouldn't be touched by a function const.
One situation I've encountered in some of my C++ code is the case where the object may change, but where I want to "lock out" functions from access certain key functionality of the object. For example, for an std::ifstream file handle, I may wish to prevent the function from closing the file.
If I pass it as a const &, the const part keeps me from performing standard file i/o, it seems.
e.g. I want something along the lines of
void GetTags(Arr<std::string> & tags, std::ifstream const& fileHandle)
...but written in such a way to allow file i/o but not open/close operations.
Is there any good/reliable way to do this in C++? What would be considered best practice?
This has already been done for you by the standard library design: Pass a reference to the base class std::istream instead, which does not have a notion of opening or closing - it exposes only the stream interface.
void stream_me(std::istream & is);
std::ifstream is("myfile.txt");
stream_me(is);
In your place I'd just pass a std::istream instead.
You could wrap the ifstream in an object that only exposed the functionality that you wished the caller to be able to use.
However, if you have a bunch of different functions, each with a different subset of ifstream's functionality, you'll end up with lots of different wrapper classes; so I don't see this as a general solution.
I think the best way would be to wrap the ifstream in a new class which only has member functions corresponding to the functionality you wantGetTags to have access to. Then pass that not the ifstream as the second argument to GetTags.
Up until now, whenever I wanted to pass some raw data to a function (like a function that loads an image from a buffer), I would do something like this:
void Image::load(const char* buffer, std::size_t size);
Today I took a look at the Boost libraries, more specifically at the property_tree/xml_parser.hpp header, and I noticed this function signature:
template<typename Ptree>
void read_xml(std::basic_istream<typename Ptree::key_type::value_type>&,
Ptree &, int = 0);
This actually made me curious: is this the correct way to pass around raw data in C++, by using streams? Or am I misinterpreting what the function is supposed to be used for?
If it's the former, could you please point me to some resource where I can learn how to use streams for this? I haven't found much myself (mainly API references), and I have't been able to find the Boost source code for the XML parser either.
Edit: Some extra details
Seems there's been some confusion as to what I want. Given a data buffer, how can I convert it to a stream such that it is compatible with the read_xml function I posted above? Here's my specific use case:
I'm using the SevenZip C library to read an XML file from an archive. The library will provide me with a buffer and its size, and I want to put that in stream format such that it is compatible with read_xml. How can I do that?
Well, streams are quite used in C++ because of their conveniences:
- error handling
- they abstract away the data source, so whether you are reading from a file, an audio source, a camera, they are all treated as input streams
- and probably more advantages I don't know of
Here is an overview of the IOstream library, perhaps that might better help you understand what's going on with streams:
http://www.cplusplus.com/reference/iostream/
Understanding what they are exactly will help you understand how and when to use them.
There's no single correct way to pass around data buffers. A combination of pointer and length is the most basic way; it's C-friendly. Passing a stream might allow for sequential/chunked processing - i. e. not storing the whole file in memory at the same time. If you want to pass a mutable buffer (that might potentially grow), a vector<char>& would be a good choice.
Specifically on Windows, a HGLOBAL or a section object handle might be used.
The C++ philosophy explicitly allows for many different styles, depending on context and environment. Get used to it.
Buffers of raw memory in C++ can either be of type unsigned char*, or you can create a std::vector<unsigned char>. You typically don't want to use just a char* for your buffer since char is not guaranteed by the standard to use all the bits in a single byte (i.e., this will end up varying by platform/compiler). That being said, streams have some excellent uses as well, considering that you can use a stream to read bytes from a file or some other input, etc., and from there, store that data in a buffer.
Seems there's been some confusion as to what I want. Given a data buffer, how can I convert it to a stream such that it is compatible with the read_xml function I posted above?
Easily (I hope PTree::Key_type::value_type would be something like char):
istringstream stream(string(data, len));
read_xml(stream, ...);
More on string streams here.
This is essentially using a reference to pass the stream contents. So behind the scene's it's essentially rather similar to what you did so far and it's essentially the same - just using a different notation. Simplified, the reference just hides the pointer aspect, so in your boost example you're essentially working with a pointer to the stream.
References got the advantage avoiding all the referencing/dereferencing and are therefore easier to handle in most situations. However they don't allow you multiple levels of (de-)referencing.
The following two example functions do essentially the same:
void change_a(int &var, myclass &cls)
{
var = cls.convert();
}
void change_b(int *var, myclass *cls)
{
*var = cls->convert();
}
Talking about the passed data itself: It really depends on what you're trying to achieve and what's more effective. If you'd like to modify a string, utilizing an object of class std::string might be more convenient than using a classic pointer to a buffer (char *). Streams got the advantage that they can represent several different things (e.g. data stream on the network, a compressed stream or simply a file or memory stream). This way you can write single functions or methods that accept a stream as input and will instantly work without worrying about the actual stream source. Doing this with classic buffers can be more complicated. On the other side you shouldn't forget that all objects will add some overhead, so depending on the job to be done a simple pointer to a character string might be perfectly fine (and the most effective solution). There's no "the one way to do it".
I wrote a file parser for a game I'm writing to make it easy for myself to change various aspects of the game (things like the character/stage/collision data). For example, I might have a character class like this:
class Character
{
public:
int x, y; // Character's location
Character* teammate;
}
I set up my parser to read in from a file the data structure with syntax similar to C++
Character Sidekick
{
X = 12
Y = 0
}
Character AwesomeDude
{
X = 10
Y = 50
Teammate = Sidekick
}
This will create two data structures and put them in a map<std::string, Character*>, where the key string is whatever name I gave it (in this case Sidekick and AwesomeDude). When my parser sees a pointer to a class, like the teammate pointer, it's smart enough to look up in the map to fetch the pointer to that data structure. The problem is that I can't declare Sidekick's teammate to be AwesomeDude because it hasn't been placed into the Character map yet.
I'm trying to find the best way to solve this so that I can have my data structures reference objects that haven't yet been added to the map. The two easiest solutions that I can think of are (a) add the ability to forward declare data structures or (b) have the parser read through the file twice, once to populate the map with pointers to empty data structures and a second time to go through and fill them in.
The problem with (a) is that I also can decide which constructor to call on a class, and if I forward declare something I'd have to have the constructor be apart from the rest of the data, which could be confusing. The problem with (b) is that I might want to declare Sidekick and AwesomeDude in their own files. I'd have to make my parser be able to take a list of files to read rather than just one at a time (this isn't so bad I guess, although sometimes I might want to get a list of files to read from a file). (b) also has the drawback of not being able to use data structures declared later in the constructor itself, but I don't think that's a huge deal.
Which way sounds like a better approach? Is there a third option I haven't thought of? It seems like there ought to be some clever solution to this with pointer references or binding or something... :-/ I suppose this is somewhat subjective based on what features I want to give myself, but any input is welcome.
When you encounter the reference the first time, simply store it as a reference. Then, you can put the character, or the reference, or whatever on a list of "references that need to be resolved later".
When the file is done, run through those that have references and resolve them.
Well, you asked for a third option. You don't have to use XML, but if you follow the following structure, it would be very simple to use a SAX parser to build your data structure.
At any rate, instead of referencing a teammate, each character references a team (Blue team in this case). This will decouple the circular reference issue. Just make sure you list the teams before the characters.
<team>Blue</team>
<character>
<name>Sidekick</name>
<X>12</X>
<Y>0</Y>
<teamref>Blue</teamref>
</character>
<character>
<name>Sidekick</name>
<X>10</X>
<Y>50</Y>
<teamref>Blue</teamref>
</character>
Personally, I'd go with b). Splitting your code into Parser and Validator classes, both operating on the same data structure. The Parser will read and parse a file, filling the data structure and storing any object references as their textual names, leaving the real pointer null in your structure for now.
When you are finished loading the files, use the Validator class to validate and resolve any references, filling in the "real" pointers. You will want to consider how to structure your data to make these lookups nice and fast.
Will said exactly what I was about to write. Just keep a list or something with the unsolved references.
And don't forget to throw an error if there are unsolved references once you finish reading the file =P
Instead of storing Character object in your map, store a proxy for Character. The proxy will than contain a pointer to the actual Character object when the object is loaded. The type of Character::teammate will be changed to this proxy type. When you read in a reference that is not already in your map, you create a proxy and use the proxy. When you load an character which you already have an empty proxy in the map, populate it with your newly loaded character. You may also want to add a counter to keep track of how many empty proxy you have in the map so you know when all referenced characters have been loaded.
Another layer of indirection....it always make programming easier and slower.
One option would be to reverse the obligation. The Map is responsible for filling in the reference
template<T> class SymbolMap // I never could rememeber C++ template syntax
{
...
/// fill in target with thing name
/// if no name yet, add it to the list of thing that will be name
void Set(T& target, std::string name);
/// define name as target
/// go back and fill in anything that needs to be name
void Define(T target, std::string name);
/// make sure everything is resolved
~SymbolMap()
}
that won't interact well with value/moving semantics but I suspect that not much will.