How does this parsing function using stringstream work?

How does this parsing function using stringstream work? - c++

so I made a function to parse a given string with a comma delimiter last semester during a haze. Its very likely I took much of it from guides online, but it worked for the overall project so I did it. Now i'm going back and reviewing it, and i'm confused. Here is the code
`vector parsedString(string line){
vector splitStrings;
stringstream inputString(line);
while(inputString.good()){
string substr;
getline(inputString,substr,',');
splitStrings.push_back(substr);
substr = "";
}
return splitStrings;
}`
The purpose was to put each part of the line thats seperated into a vector, then take that vector back where its needed with all the parts. However, I do NOT understand the stringstream aspects of this.
To be specific, when I wrote code to check stringstreams contents during the loop, it stayed the same for the entire time. If getline() is supposed to track where the last delimiter was, why does it not show in the contents?
also if possible, an explanation on how .good() works in this case would be phenomenal. I understand stringstream is a stream, and function of that sort are supposed to check if streams are finished or not, but again I don't understand how the program would know that.
Everything works as intended, there is no mistakes being made from what I can see. I just fundamentally can't seem to grasp why its working, and I don't want my lack of knowledge to come back to bite me.

A istringstream is not just a string. If it were, it would be redundant.
For a first approximation, you could think of it as a class whose instances contain a string and a position (i.e., a string index). When you construct the istringstream from a string, the position is initialised to 0. When you read a character from the istringstream, you get the character at the position, and the position is incremented. So each time you read a character, you get the next one. (Actually, a stringstream has two positions, one for reading and one for writing, because it's a combination of an istringstream and an ostreamstring. But only the input part is relevant to your question.)
All other stream input operations are based on reading a single character, although implementations are allowed to be more efficient if the results are the same.
The above was an oversimplification, of course. A stream has other state: status bits, formatting parameters, locale settings, and more stuff I'm forgetting. See this overview for more details. But the basic point stands: the string is only a part of a stringstream's state: the rest of the state is used to make it look like an I/O stream. Which turns out to be useful if you want to pick it apart sequentially into tokens.

Related

Why do we need to use a stringstream when splitting a string?

Please note, I've never used streams before today, so my understanding of them remains rather vague. Apologies when I say something appallingly stupid.
Here I have a short bit of code that splits up a stringstream into a bunch of strings at each space.
vector<string> words;
stringstream ss("some random words that I wrote just now");
string word;
while(getline(ss, word, ' ')){
words.push_back(word);
}
I'm wondering why we're using a stringstream here, rather than just a string.
This would work like:
Create a string object at memory location x
When referenced, go through each character and check if it is a space. All previous characters should be saved somewhere temporary.
If it is a space, grab all the stuff we've just stored and stick it on the end of the vector, then, clear the temp storage thing. If it's not a space, go back to step 2
What's storing "some random words that I wrote just now" as a stringstream going to do to help us here?
Is it just making a stream of characters so that we can check through them? Is this necessary? Are we always doing this, even in other languages?

I'm wondering why we're using a stringstream here, rather than just a string.
If this is the question, then one big reason why stringstream is used is simply -- because it works with little effort by the programmer. The less code you write, the less chance for bugs to occur.
Your method of using just std::string and searching for spaces requires the C++ programmer to write all of those steps (create a string, manually search for spaces, etc). It may be trivial to write, but even the best programmers can make mistakes in trivial code. The code may have bugs, may not cover all of the corner cases, etc.
As to ease of use:
When a C++ programmer sees stringstream with respect to usage of separating a sting with whitespace, the purpose of the code is immediately known.
If on the other hand, a programmer decides to manually parse the data by using just string and searching for spaces, the code is not immediately realized as to what it does when another programmer reads the code. Sure, it may be a quick realization of the code by the other programmer, but I can bet the other programmer will say "why didn't you use stringstream?".

What's storing "some random words that I wrote just now" as a stringstream going to do to help us here? Is it just making a stream of characters so that we can check through them? Is this necessary?
std::stringstream just allows you to use the usual input/output operations such as >> and std::getline on a string. You can't use std::getline to read from an std::string, so you put the string in a std::streamstream first. You can totally parse a string by looping over the characters yourself as you described.
Are we always doing this, even in other languages?
Not in Python at least. There you would just do words = line.split(' ').

Is there any way to read from a .csv file without the use of getline()?

In a project that was assigned to me, that has to do with reading .csv files, my teacher made it clear that I cannot use strings. Wherever I have looked for help, the only solution I have found is the use of the function getline(), which only takes strings. Is there any other possible way to do it?

It sounds as though your teacher is expecting you to test the input one character at a time (until end of line), and so is looking to see you do things such as
test for end of file
read characters and save progressively into variables
test for the comma character and write out the current variable
ignore characters that aren't part of your required input. (You didn't mention whether the CSV is of strings or of numeric values, but if the exercise was to identify integer values then you'd be rejecting all non-numeric characters.
Avoiding strings is technically not possible (since that is what a CSV is!) but I would guess the intention is to get you understanding streams as characters and writing logic to do with each character at a time.

C++ how to check if the std::cin buffer is empty

The title is misleading because I'm more interested in finding an alternate solution. My gut feeling is that checking whether the buffer is empty is not the most ideal solution (at least in my case).
I'm new to C++ and have been following Bjarne Stroustrup's Programming Principles and Practices using C++. I'm currently on Chapter 7, where we are "refining" the calculator from Chapter 6. (I'll put the links for the source code at the end of the question.)
Basically, the calculator can take multiple inputs from the user, delimited by semi-colons.
> 5+2; 10*2; 5-1;
= 7
> = 20
> = 4
>
But I'd like to get rid of the prompt character ('>') for the last two answers, and display it again only when the user input is asked for. My first instinct was to find a way to check if the buffer is empty, and if so, cout the character and if not, proceed with couting the answer. But after a bit of googling I realized the task is not as easy as I initially thought... And also that maybe that wasn't a good idea to begin with.
I guess essentially my question is how to get rid of the '>' characters for the last two answers when there are multiple inputs. But if checking the cin buffer is possible and is not a bad idea after all, I'd love to know how to do it.
Source code: https://gist.github.com/Spicy-Pumpkin/4187856492ccca1a24eaa741d7417675
Header file: http://www.stroustrup.com/Programming/PPP2code/std_lib_facilities.h
^ You need this header file. I assume it is written by the author himself.
Edit: I did look around the web for some solutions, but to be honest none of them made any sense to me. It's been like 4 days since I picked up C++ and I have a very thin background in programming, so sometimes even googling is a little tough..

As you've discovered, this is a deceptively complicated task. This is because there are multiple issues here at play, both the C++ library, and the actual underlying file.
C++ library
std::cin, and C++ input streams, use an intermediate buffer, a std::streambuf. Input from the underlying file, or an interactive terminal, is not read character by character, but rather in moderately sized chunks, where possible. Let's say:
int n;
std::cin >> n;
Let's say that when this is done and over is, n contains the number 42. Well, what actually happened is that std::cin, more than likely, did not read just two characters, '4' and '2', but whatever additional characters, beyond that, were available on the std::cin stream. The remaining characters were stored in the std::streambuf, and the next input operation will read them, before actually reading the underlying file.
And it is equally likely that the above >> did not actually read anything from the file, but rather fetched the '4' and the '2' characters from the std::streambuf, that were left there after the previous input operation.
It is possible to examine the underlying std::streambuf, and determine whether there's anything unread there. But this doesn't really help you.
If you were about to execute the above >> operator, you looked at the underlying std::streambuf, and discover that it contains a single character '4', that also doesn't tell you much. You need to know what the next character is in std::cin. It could be a space or a newline, in which case all you'll get from the >> operator is 4. Or, the next character could be '2', in which case >> will swallow at least '42', and possibly more digits.
You can certainly implement all this logic yourself, look at the underlying std::streambuf, and determine whether it will satisfy your upcoming input operation. Congratulations: you've just reinvented the >> operator. You might as well just parse the input, a character at a time, yourself.
The underlying file
You determined that std::cin does not have sufficient input to satisfy your next input operation. Now, you need to know whether or not input is available on std::cin.
This now becomes an operating system-specific subject matter. This is no longer covered by the standard C++ library.
Conclusion
This is doable, but in all practical situations, the best solution here is to use an operating system-specific approach, instead of C++ input streams, and read and buffer your input yourself. On Linux, for example, the classical approach is to set fd 0 to non-blocking mode, so that read() does not block, and to determine whether or not there's available input, just try read() it. If you did read something, put it into a buffer that you can look at later. Once you've consumed all previously-read buffered input, and you truly need to wait for more input to be read, poll() the file descriptor, until it's there.

Using Getline on a Binary File

I have read that getline behaves as an unformatted input function. Which I believe should allow it to be used on a binary file. Let's say for example that I've done this:
ofstream ouput("foo.txt", ios_base::binary);
const auto foo = "lorem ipsum";
output.write(foo, strlen(foo) + 1);
output.close();
ifstream input("foo.txt", ios_base::binary);
string bar;
getline(input, bar, '\0');
Is that breaking any rules? It seems to work fine, I think I've just traditionally seen arrays handled by writing the size and then writing the array.

No, it's not breaking any rules that I can see.
Yes, it's more common to write an array with a prefixed size, but using a delimiter to mark the end can work perfectly well also. The big difference is that (like with a text file) you have to read through data to find the next item. With a prefixed size, you can look at the size, and skip directly to the next item if you don't need the current one. Of course, you also need to ensure that if you're using something to mark the end of a field, that it can never occur inside the field (or come up with some way of detecting when it's inside a field, so you can read the rest of the field when it does).
Depending on the situation, that can mean (for example) using Unicode text. This gives you a lot of options for values that can't occur inside the text (because they aren't legal Unicode). That, on the other hand, would also mean that your "binary" file is really a text file, and has to follow some basic text-file rules to make sense.
Which is preferable depends on how likely it is that you'll want to read random pieces of the file rather than reading through it from beginning to end, as well as the difficulty (if any) of finding a unique delimiter and if you don't have one, the complexity of making the delimiter recognizable from data inside a field. If the data is only meaningful if written in order, then having to read it in order doesn't really pose a problem. If you can read individual pieces meaningfully, then being able to do so much more likely to be useful.
In the end, it comes down to a question of what you want out of your file being "binary'. In the typical case, all 'binary" really means is that what end of line markers that might be translated from a new-line character to (for example) a carriage-return/line-feed pair, won't be. Depending on the OS you're using, it might not even mean that much though--for example, on Linux, there's normally no difference between binary and text mode at all.

Well, there are no rules broken and you'll get away with that just fine, except that may miss the precision of reading binary from a stream object.
With binary input, you usually want to know how many characters were read successfully, which you can obtain afterwards with gcount()... Using std::getline will not reflect the bytes read in gcount().
Of cause, you can simply get such info from the size of the string you passed into std::getline. But the stream will no longer encapsulate the number of bytes you consumed in the last Unformatted Operation

Is there any way to read characters that satisfy certain conditions only from stdin in C++?

I am trying to read some characters that satisfy certain condition from stdin with iostream library while leave those not satisfying the condition in stdin so that those skipped characters can be read later. Is it possible?
For example, I want characters in a-c only and the input stream is abdddcxa.
First read in all characters in a-c - abca; after this input finished, start read the remaining characters dddx. (This two inputs can't happen simultaneously. They might be in two different functions).

Wouldn't it be simpler to read everything, then split the input into the two parts you need and finally send each part to the function that needs to process it?
Keeping the data in the stdin buffer is akin to using globals, it makes your program harder to understand and leaves the risk of other code (or the user) changing what is in the buffer while you process it.
On the other hand, dividing your program into "the part that reads the data", "the part that parses the data and divides the workload" and the "part that does the work" makes for a better structured program which is easy to understand and test.
You can probably use regex to do the actual split.

What you're asking for is the putback method (for more details see: http://www.cplusplus.com/reference/istream/istream/putback/). You would have to read everything, filter the part that you don't want to keep out, and put it back into the stream. So for instance:
cin >> myString;
// Do stuff to fill putbackBuf[] with characters in reverse order to be put back
pPutbackBuf = &putbackBuf[0];
do{
cin.putback(*(pPutbackBuf++));
while(*pPutbackBuf);
Another solution (which is not exactly what you're asking for) would be to split the input into two strings and then feed the "non-inputted" string into a stringstream and pass that to whatever function needs to do something with the rest of the characters.

What you want to do is not possible in general; ungetc and putback exist, but they're not guaranteed to work for more than one character. They don't actually change stdin; they just push back on an input buffer.
What you could do instead is to explicitly keep a buffer of your own, by reading the input into a string and processing that string. Streams don't let you safely rewind in many cases, though.

No, random access is not possible for streams (except for fstream an stringstream). You will have to read in the whole line/input and process the resulting string (which you could, however, do using iostreams/std::stringstream if you think it is the best tool for that -- I don't think that but iostreams gurus may differ).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js