Why do we need to use a stringstream when splitting a string? - c++

Please note, I've never used streams before today, so my understanding of them remains rather vague. Apologies when I say something appallingly stupid.
Here I have a short bit of code that splits up a stringstream into a bunch of strings at each space.
vector<string> words;
stringstream ss("some random words that I wrote just now");
string word;
while(getline(ss, word, ' ')){
words.push_back(word);
}
I'm wondering why we're using a stringstream here, rather than just a string.
This would work like:
Create a string object at memory location x
When referenced, go through each character and check if it is a space. All previous characters should be saved somewhere temporary.
If it is a space, grab all the stuff we've just stored and stick it on the end of the vector, then, clear the temp storage thing. If it's not a space, go back to step 2
What's storing "some random words that I wrote just now" as a stringstream going to do to help us here?
Is it just making a stream of characters so that we can check through them? Is this necessary? Are we always doing this, even in other languages?

I'm wondering why we're using a stringstream here, rather than just a string.
If this is the question, then one big reason why stringstream is used is simply -- because it works with little effort by the programmer. The less code you write, the less chance for bugs to occur.
Your method of using just std::string and searching for spaces requires the C++ programmer to write all of those steps (create a string, manually search for spaces, etc). It may be trivial to write, but even the best programmers can make mistakes in trivial code. The code may have bugs, may not cover all of the corner cases, etc.
As to ease of use:
When a C++ programmer sees stringstream with respect to usage of separating a sting with whitespace, the purpose of the code is immediately known.
If on the other hand, a programmer decides to manually parse the data by using just string and searching for spaces, the code is not immediately realized as to what it does when another programmer reads the code. Sure, it may be a quick realization of the code by the other programmer, but I can bet the other programmer will say "why didn't you use stringstream?".

What's storing "some random words that I wrote just now" as a stringstream going to do to help us here? Is it just making a stream of characters so that we can check through them? Is this necessary?
std::stringstream just allows you to use the usual input/output operations such as >> and std::getline on a string. You can't use std::getline to read from an std::string, so you put the string in a std::streamstream first. You can totally parse a string by looping over the characters yourself as you described.
Are we always doing this, even in other languages?
Not in Python at least. There you would just do words = line.split(' ').

Related

How does this parsing function using stringstream work?

so I made a function to parse a given string with a comma delimiter last semester during a haze. Its very likely I took much of it from guides online, but it worked for the overall project so I did it. Now i'm going back and reviewing it, and i'm confused. Here is the code
`vector parsedString(string line){
vector splitStrings;
stringstream inputString(line);
while(inputString.good()){
string substr;
getline(inputString,substr,',');
splitStrings.push_back(substr);
substr = "";
}
return splitStrings;
}`
The purpose was to put each part of the line thats seperated into a vector, then take that vector back where its needed with all the parts. However, I do NOT understand the stringstream aspects of this.
To be specific, when I wrote code to check stringstreams contents during the loop, it stayed the same for the entire time. If getline() is supposed to track where the last delimiter was, why does it not show in the contents?
also if possible, an explanation on how .good() works in this case would be phenomenal. I understand stringstream is a stream, and function of that sort are supposed to check if streams are finished or not, but again I don't understand how the program would know that.
Everything works as intended, there is no mistakes being made from what I can see. I just fundamentally can't seem to grasp why its working, and I don't want my lack of knowledge to come back to bite me.
A istringstream is not just a string. If it were, it would be redundant.
For a first approximation, you could think of it as a class whose instances contain a string and a position (i.e., a string index). When you construct the istringstream from a string, the position is initialised to 0. When you read a character from the istringstream, you get the character at the position, and the position is incremented. So each time you read a character, you get the next one. (Actually, a stringstream has two positions, one for reading and one for writing, because it's a combination of an istringstream and an ostreamstring. But only the input part is relevant to your question.)
All other stream input operations are based on reading a single character, although implementations are allowed to be more efficient if the results are the same.
The above was an oversimplification, of course. A stream has other state: status bits, formatting parameters, locale settings, and more stuff I'm forgetting. See this overview for more details. But the basic point stands: the string is only a part of a stringstream's state: the rest of the state is used to make it look like an I/O stream. Which turns out to be useful if you want to pick it apart sequentially into tokens.

Looking for a more compact syntax (simple code) - C++

I'm relearning C++ and I found myself often writing pieces of code like this:
vector<string> Text;
string Line;
while (getline(cin, Line))
{
Text.push_back(Line);
}
I was wondering if there is a more compact way to write the loop using only basic features (no user-written functions, for example) - more or less, putting everything in the condition?
You can use a for loop. We can declare Line in the variable declaration part and use the condition and increment parts to read and place the line. That gives you
for(string Line; getline(cin, Line); Text.push_back(Line));
more or less, putting everything in the condition?
You can do this
while (getline(cin, Line) && (Text.push_back(Line),true)) {}
It works because && is short-circuited and because the comma-operator evaluates the first operand, discards the result and returns the result of the second operand.
So it is possible, but why would you want to do that? Making code as dense as possible is rarely good for readability (actually your original code is more readable and uses less characters).
Well, at the expense of some obfuscation,
while (getline(cin, Line) && (Text.push_back(Line), 1));
would do it: note the use of the expression separator operator which, informally speaking, "converts" the void return type of push_back to an int so enabling its use with the short-circuiting &&.
But as a rule of thumb, work with the language, not against it. My answer is doing the latter. The way you present the code in your question is adequate.
At the risk of "but that is exactly OP's code!", I would personally favor this version if this is the entire body of a scope (e.g. function that parses the text):
vector<string> Text;
string Line;
while (getline(cin, Line))
Text.push_back(Line);
Alternatively, if part of a larger scope, I would probably group all four lines together and add empty lines before and after for visual coherence (and maybe add a short comment before it):
// [lots of other code]
// Gather input from cin.
vector<string> Text;
string Line;
while (getline(cin, Line))
Text.push_back(Line);
// [lots of other code]
I am aware that this introduces no clever tricks, but this is the most compact and readable form of the given code, at least to me.
If you wanted compactness above all else, you could choose garbage variables, omit all unnecessary whitespace and even alias the types beforehand (since we are "often writing" this kind of code this is a one-off) to, say, V<S> t;S l;while(getline(cin,l))t.push_back(l); but nobody wants to read that.
So clearly there is more than compactness at play. As for me, I'm looking to keep noise to a minimum while retaining intuitive readability, and I would suggest this is an agreeable goal.
I would never use the "throw everything into the loop condition" suggestions because that very much breaks how I expect code to be structured: The main purpose of your loop goes into the loop body. You may disagree/have different expectations, but in my eyes everything else is just an attempt to show off your minifying skills, it does not produce good code.
The above accomplishes just that: The braces are noise for this simple operation, and the important part stands out as the loop body. "But is getline not also important?" - It is, and I would honestly prefer a version where it is in the loop body, such as a hypothetical
vector<string> Text;
while (cin.hasLine())
Text.push_back(readLine(cin));
This would be an ideal loop to me: The condition only checks for termination and the loop body is only the operation we want to repeat.
Even better would be a standard algorithm, but I unaware of any that would help here (ranges or boost might provide, I don't know).
On a more abstract level, if OP frequently writes this exact code, it should obviously be a separate function. But even if not, the "lots of other code" example would benefit from that abstraction too.
Loop with a single instruction. you can write it in one line but I don't recommend it
while (getline(cin, Line)) Text.push_back(Line);

Using Getline on a Binary File

I have read that getline behaves as an unformatted input function. Which I believe should allow it to be used on a binary file. Let's say for example that I've done this:
ofstream ouput("foo.txt", ios_base::binary);
const auto foo = "lorem ipsum";
output.write(foo, strlen(foo) + 1);
output.close();
ifstream input("foo.txt", ios_base::binary);
string bar;
getline(input, bar, '\0');
Is that breaking any rules? It seems to work fine, I think I've just traditionally seen arrays handled by writing the size and then writing the array.
No, it's not breaking any rules that I can see.
Yes, it's more common to write an array with a prefixed size, but using a delimiter to mark the end can work perfectly well also. The big difference is that (like with a text file) you have to read through data to find the next item. With a prefixed size, you can look at the size, and skip directly to the next item if you don't need the current one. Of course, you also need to ensure that if you're using something to mark the end of a field, that it can never occur inside the field (or come up with some way of detecting when it's inside a field, so you can read the rest of the field when it does).
Depending on the situation, that can mean (for example) using Unicode text. This gives you a lot of options for values that can't occur inside the text (because they aren't legal Unicode). That, on the other hand, would also mean that your "binary" file is really a text file, and has to follow some basic text-file rules to make sense.
Which is preferable depends on how likely it is that you'll want to read random pieces of the file rather than reading through it from beginning to end, as well as the difficulty (if any) of finding a unique delimiter and if you don't have one, the complexity of making the delimiter recognizable from data inside a field. If the data is only meaningful if written in order, then having to read it in order doesn't really pose a problem. If you can read individual pieces meaningfully, then being able to do so much more likely to be useful.
In the end, it comes down to a question of what you want out of your file being "binary'. In the typical case, all 'binary" really means is that what end of line markers that might be translated from a new-line character to (for example) a carriage-return/line-feed pair, won't be. Depending on the OS you're using, it might not even mean that much though--for example, on Linux, there's normally no difference between binary and text mode at all.
Well, there are no rules broken and you'll get away with that just fine, except that may miss the precision of reading binary from a stream object.
With binary input, you usually want to know how many characters were read successfully, which you can obtain afterwards with gcount()... Using std::getline will not reflect the bytes read in gcount().
Of cause, you can simply get such info from the size of the string you passed into std::getline. But the stream will no longer encapsulate the number of bytes you consumed in the last Unformatted Operation

Is there any way to read characters that satisfy certain conditions only from stdin in C++?

I am trying to read some characters that satisfy certain condition from stdin with iostream library while leave those not satisfying the condition in stdin so that those skipped characters can be read later. Is it possible?
For example, I want characters in a-c only and the input stream is abdddcxa.
First read in all characters in a-c - abca; after this input finished, start read the remaining characters dddx. (This two inputs can't happen simultaneously. They might be in two different functions).
Wouldn't it be simpler to read everything, then split the input into the two parts you need and finally send each part to the function that needs to process it?
Keeping the data in the stdin buffer is akin to using globals, it makes your program harder to understand and leaves the risk of other code (or the user) changing what is in the buffer while you process it.
On the other hand, dividing your program into "the part that reads the data", "the part that parses the data and divides the workload" and the "part that does the work" makes for a better structured program which is easy to understand and test.
You can probably use regex to do the actual split.
What you're asking for is the putback method (for more details see: http://www.cplusplus.com/reference/istream/istream/putback/). You would have to read everything, filter the part that you don't want to keep out, and put it back into the stream. So for instance:
cin >> myString;
// Do stuff to fill putbackBuf[] with characters in reverse order to be put back
pPutbackBuf = &putbackBuf[0];
do{
cin.putback(*(pPutbackBuf++));
while(*pPutbackBuf);
Another solution (which is not exactly what you're asking for) would be to split the input into two strings and then feed the "non-inputted" string into a stringstream and pass that to whatever function needs to do something with the rest of the characters.
What you want to do is not possible in general; ungetc and putback exist, but they're not guaranteed to work for more than one character. They don't actually change stdin; they just push back on an input buffer.
What you could do instead is to explicitly keep a buffer of your own, by reading the input into a string and processing that string. Streams don't let you safely rewind in many cases, though.
No, random access is not possible for streams (except for fstream an stringstream). You will have to read in the whole line/input and process the resulting string (which you could, however, do using iostreams/std::stringstream if you think it is the best tool for that -- I don't think that but iostreams gurus may differ).

What's the difference between istringstream, ostringstream and stringstream? / Why not use stringstream in every case?

When would I use std::istringstream, std::ostringstream and std::stringstream and why shouldn't I just use std::stringstream in every scenario (are there any runtime performance issues?).
Lastly, is there anything bad about this (instead of using a stream at all):
std::string stHehe("Hello ");
stHehe += "stackoverflow.com";
stHehe += "!";
Personally, I find it very rare that I want to perform streaming into and out of the same string stream.
Usually I want to either initialize a stream from a string and then parse it; or stream things to a string stream and then extract the result and store it.
If you're streaming to and from the same stream, you have to be very careful with the stream state and stream positions.
Using 'just' istringstream or ostringstream better expresses your intent and gives you some checking against silly mistakes such as accidental use of << vs >>.
There might be some performance improvement but I wouldn't be looking at that first.
There's nothing wrong with what you've written. If you find it doesn't perform well enough, then you could profile other approaches, otherwise stick with what's clearest. Personally, I'd just go for:
std::string stHehe( "Hello stackoverflow.com!" );
A stringstream is somewhat larger, and might have slightly lower performance -- multiple inheritance can require an adjustment to the vtable pointer. The main difference is (at least in theory) better expressing your intent, and preventing you from accidentally using >> where you intended << (or vice versa). OTOH, the difference is sufficiently small that especially for quick bits of demonstration code and such, I'm lazy and just use stringstream. I can't quite remember the last time I accidentally used << when I intended >>, so to me that bit of safety seems mostly theoretical (especially since if you do make such a mistake, it'll almost always be really obvious almost immediately).
Nothing at all wrong with just using a string, as long as it accomplishes what you want. If you're just putting strings together, it's easy and works fine. If you want to format other kinds of data though, a stringstream will support that, and a string mostly won't.
In most cases, you won't find yourself needing both input and output on the same stringstream, so using std::ostringstream and std::istringstream explicitly makes your intention clear. It also prevents you from accidentally typing the wrong operator (<< vs >>).
When you need to do both operations on the same stream you would obviously use the general purpose version.
Performance issues would be the least of your concerns here, clarity is the main advantage.
Finally there's nothing wrong with using string append as you have to construct pure strings. You just can't use that to combine numbers like you can in languages such as perl.
istringstream is for input, ostringstream for output. stringstream is input and output.
You can use stringstream pretty much everywhere.
However, if you give your object to another user, and it uses operator >> whereas you where waiting a write only object, you will not be happy ;-)
PS:
nothing bad about it, just performance issues.
std::ostringstream::str() creates a copy of the stream's content, which doubles memory usage in some situations. You can use std::stringstream and its rdbuf() function instead to avoid this.
More details here: how to write ostringstream directly to cout
To answer your third question: No, that's perfectly reasonable. The advantage of using streams is that you can enter any sort of value that's got an operator<< defined, while you can only add strings (either C++ or C) to a std::string.
Presumably when only insertion or only extraction is appropriate for your operation you could use one of the 'i' or 'o' prefixed versions to exclude the unwanted operation.
If that is not important then you can use the i/o version.
The string concatenation you're showing is perfectly valid. Although concatenation using stringstream is possible that is not the most useful feature of stringstreams, which is to be able to insert and extract POD and abstract data types.
Why open a file for read/write access if you only need to read from it, for example?
What if multiple processes needed to read from the same file?