scanf on an istream object - c++

NOTE: I've seen the post What is the cin analougus of scanf formatted input? before asking the question and the post doesn't solve my problem here. The post seeks for C++-way to do it, but as I mentioned already, it is inconvenient to just use C++-way to do it sometimes and I have clear examples for that.
I am trying to read data from an istream object, and sometimes it is inconvenient to just use C++-style ways such as operator>>, e.g. the data are in special form 123:456 so you have to imbue to make ':' as space (which is very hacky, as opposed to %d:%d in scanf), or 00123 where you want to read as string and convert decimal instead of octal (as opposed to %d in scanf), and possibly many other cases.
The reason I chose istream as interface is because it can be derived and therefore more flexible. For example, we can create in-memory streams, or some customized streams that generated on the fly, etc. C-style FILE*, on the other hand, is very limited, at least in a standard-compliant way, on creating customized streams.
So my questions is, is there a way to do scanf-like data extraction on istream object? I think fscanf internally read character by character from FILE* using fgetc, while istream also provides such interface. So it is possible by just copying and pasting the code of fscanf and replace the FILE* with the istream object, but that's very hacky. Is there a smarter and cleaner way, or is there some existing work on this?
Thanks.

You should never, under any circumstances, use scanf or its relatives for anything, for three reasons:
Many format strings, including for instance all the simple uses of %s, are just as dangerous as gets.
It is almost impossible to recover from malformed input, because scanf does not tell you how far in characters into the input it got when it hit something unexpected.
Numeric overflow triggers undefined behavior: yes, that means scanf is allowed to crash the entire program if a numeric field in the input has too many digits.
Prior to C++11, the C++ specification defined istream formatted input of numbers in terms of scanf, which means that last objection is very likely to apply to them as well! (In C++11 the specification is changed to use strto* instead and to do something predictable if that detects overflow.)
What you should do instead is: read entire lines of input into std::string objects with getline, hand-code logic to split them up into fields (I don't remember off the top of my head what the C++-string equivalent of strsep is, but I'm sure it exists) and then convert numeric strings to machine numbers with the strtol/strtod family of functions.
I cannot emphasize this enough: THE ONLY 100% RELIABLE WAY TO CONVERT STRINGS TO NUMBERS IN C OR C++, unless you are lucky enough to have a C++ runtime that is already C++11-conformant in this regard, IS WITH THE strto* FUNCTIONS, and you must use them correctly:
errno = 0;
result = strtoX(s, &ends, 10); // omit 10 for floats
if (s == ends || *ends || errno)
parse_error();
(The OpenBSD manpages, linked above, explain why you have to do this fairly convoluted thing.)
(If you're clever, you can use ends and some manual logic to skip that colon, instead of strsep.)

I do not recommend you to mix C++ input output and C input output. No that they are really incompatible but they could just plain interoperate wrong.
For example Oracle docs recommend not to mix it http://www.oracle.com/technetwork/articles/servers-storage-dev/mixingcandcpluspluscode-305840.html
But no one stops you from reading data into the buffer and parsing it with standard c functions like sscanf.
...
string curString;
int a, b;
...
std::getline(inputStream, curString);
int sscanfResult == sscanf(curString.cstr(), "%d:%d", &a, &b);
if (2 != sscanfResult)
throw "error";
...
But it won't help in some situations when your stream is just one long contiguous sequence of symbols(like some string turned into memory stream).
Making your own fscanf from scratch or porting(?) the original CRT function actually isn't the worst possible idea. Just make sure you have tested it thoroughly(low level custom char manipulation was always a source of pain in C).
I've never really tried the boost\spirit and such parsing infrastructure could really be an overkill for your project. But boost libraries are usually well tested and designed. You could at least try to use it.

Based on #tmyklebu's comment, I implemented streamScanf which wraps istream as FILE* via fopencookie: https://github.com/likan999/codejam/blob/master/Common/StreamScanf.cpp

Related

A better replacement for istrstream?

istrstream was perfect for my needs - basically, take a fixed char buffer, and give me a simple way to extract lines getline() and test for eof()
I'm switching our projects to C++ 17 compliance - which has deprecated istrsteam - apparently because there are too many C++ programmers who cannot fathom fixed buffer memory management (are you serious?!)
At any rate, the istringstream provides the same use semantics, but it imposes the need to now copy the entire fixed character buffer at construction time.
This is an anti-pattern.
What I am looking for is either a way to use a string_view in place of a string for the istringstream, or alternately a better replacement for stringstream which itself handles externally managed fixed buffer (it need only point into it, it never need worry about managing that resource, just as strstream did).
Currently, in VS 2017, this is illegal, and if I understand things correctly, is illegal everywhere in the current state-of-art of C++ (I'm sure you'll correct me if I'm wrong!)
std::string_view raw_view(reinterpret_cast<const char *>(raw_buffer.get()), raw_buffer.size());
std::istringstream raw_stream(raw_view);
So - ideas?
Note: Peter Sommerlad has a proposal for this exact idea here for the C++ standards body:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0448r1.pdf
Continue using istrstream for the time being. It likely won't be removed until either P0448 (using std::span<char> as the source/destination of a stream buffer) or P0408 (the ability to move data into/outof stringstreams) is adopted by the standard. Either of those would serve your needs well.
That being said, if all you're trying to do is get substrings between \ns, it would be far more efficient (even with the above proposals) to just use a regex search. Or just a regular search, since you're just looking for \n. That would give you a pair of iterators that represents a line. Using iostreams for line-by-line processing of an already-loaded character buffer is overkill and will never be as efficient as the alternative.

C++ how to check if the std::cin buffer is empty

The title is misleading because I'm more interested in finding an alternate solution. My gut feeling is that checking whether the buffer is empty is not the most ideal solution (at least in my case).
I'm new to C++ and have been following Bjarne Stroustrup's Programming Principles and Practices using C++. I'm currently on Chapter 7, where we are "refining" the calculator from Chapter 6. (I'll put the links for the source code at the end of the question.)
Basically, the calculator can take multiple inputs from the user, delimited by semi-colons.
> 5+2; 10*2; 5-1;
= 7
> = 20
> = 4
>
But I'd like to get rid of the prompt character ('>') for the last two answers, and display it again only when the user input is asked for. My first instinct was to find a way to check if the buffer is empty, and if so, cout the character and if not, proceed with couting the answer. But after a bit of googling I realized the task is not as easy as I initially thought... And also that maybe that wasn't a good idea to begin with.
I guess essentially my question is how to get rid of the '>' characters for the last two answers when there are multiple inputs. But if checking the cin buffer is possible and is not a bad idea after all, I'd love to know how to do it.
Source code: https://gist.github.com/Spicy-Pumpkin/4187856492ccca1a24eaa741d7417675
Header file: http://www.stroustrup.com/Programming/PPP2code/std_lib_facilities.h
^ You need this header file. I assume it is written by the author himself.
Edit: I did look around the web for some solutions, but to be honest none of them made any sense to me. It's been like 4 days since I picked up C++ and I have a very thin background in programming, so sometimes even googling is a little tough..
As you've discovered, this is a deceptively complicated task. This is because there are multiple issues here at play, both the C++ library, and the actual underlying file.
C++ library
std::cin, and C++ input streams, use an intermediate buffer, a std::streambuf. Input from the underlying file, or an interactive terminal, is not read character by character, but rather in moderately sized chunks, where possible. Let's say:
int n;
std::cin >> n;
Let's say that when this is done and over is, n contains the number 42. Well, what actually happened is that std::cin, more than likely, did not read just two characters, '4' and '2', but whatever additional characters, beyond that, were available on the std::cin stream. The remaining characters were stored in the std::streambuf, and the next input operation will read them, before actually reading the underlying file.
And it is equally likely that the above >> did not actually read anything from the file, but rather fetched the '4' and the '2' characters from the std::streambuf, that were left there after the previous input operation.
It is possible to examine the underlying std::streambuf, and determine whether there's anything unread there. But this doesn't really help you.
If you were about to execute the above >> operator, you looked at the underlying std::streambuf, and discover that it contains a single character '4', that also doesn't tell you much. You need to know what the next character is in std::cin. It could be a space or a newline, in which case all you'll get from the >> operator is 4. Or, the next character could be '2', in which case >> will swallow at least '42', and possibly more digits.
You can certainly implement all this logic yourself, look at the underlying std::streambuf, and determine whether it will satisfy your upcoming input operation. Congratulations: you've just reinvented the >> operator. You might as well just parse the input, a character at a time, yourself.
The underlying file
You determined that std::cin does not have sufficient input to satisfy your next input operation. Now, you need to know whether or not input is available on std::cin.
This now becomes an operating system-specific subject matter. This is no longer covered by the standard C++ library.
Conclusion
This is doable, but in all practical situations, the best solution here is to use an operating system-specific approach, instead of C++ input streams, and read and buffer your input yourself. On Linux, for example, the classical approach is to set fd 0 to non-blocking mode, so that read() does not block, and to determine whether or not there's available input, just try read() it. If you did read something, put it into a buffer that you can look at later. Once you've consumed all previously-read buffered input, and you truly need to wait for more input to be read, poll() the file descriptor, until it's there.

is it good to use sscanf for parsing string

I have been using sscanf() in my parser to get some css like tokens such as color code some variations below;
#FDC69A
#ff0
orange
Example code will be;
int r g b;
cosnt char* s = "#FAFAFA";
if(sscanf(s, "#%02x%02x%02x", &r, &g, &b) == 3){
// color code ok
}
My preferred language for current project is c++, I think sscanf can be faster than regular character by character parsing and overall code will be bug free & minimal still it may have portability issues across different compilers.
A thing I noticed is, popular of open source project do not use sscanf for tokenizing input buffers instead they do it char by char, it is a bad programming practice to use sscanf in parsing that i am following?
The biggest problem with sscanf (as well as scanf and fscanf) is that numeric overflow causes undefined behavior. For example:
const char *s = "999999999999999999999999999999";
int n;
sscanf(s, "%d", &n);
The C standard says exactly nothing about how this code behaves. It might set n to some arbitrary value, it might report an error, or it might crash.
(In practice, existing implementations are likely to behave sensibly, for some value of "sensibly".)
if(sscanf(s, "#%02x%02x%02x", &r, &g, &b) == 3) is robust... nothing to worry about there.
Historically, the big concern with those functions was that someone might specify a format flag that doesn't match the argument (e.g. %d not given an int*)... many modern compilers have enough validation to avoid accidents like that.
Still, C++ has iostreams, and people tend to use those for many I/O and parsing operations as the stream destructors automatically flush and close files and release descriptors, they're type safe, extensible to user-defined types, you can generally reuse parsing/output code for any type of stream, and they're often convenient. They'd be significantly more tedious for your specific test above though.
If you've noticed lots of OSS programs scanning character by character, it may be because:
They're doing more complex parsing - where they want to branch to different parsing logic after reading individual characters, or
In your code you have a firm expectation of what to expect, so it's reasonable to do a sscanf to test that, but if you were writing say a compiler it'd be too slow to try a huge if/else list of hundreds sscanf attempts to recognise tokens.
Relevant for scanf, fscanf but not sscanf - avoid scanning too far so they can ungetc, which (from memory) is only portably guaranteed to work for 1 character.

C++ equivalent of fscanf()?

And please don't say it's fscanf() ;P
I'm trying to replace this line:
if ( fscanf(fp, "P%c\n", &ch) != 1 )
If I understand correctly, it tries to read in a char and store it to &ch, only if it's between a 'P' and a '\n'. Is that right?
And if it succeeds, it returns 1 (the number of characters it read)?
I'm trying to come up with a C++ version. Is there any easy way to do a formatted read like that? Or do I need to use fstream, operator>>, and nested if statements?
Safe C++ alternative with type checks is std::stringstream.
In Visual Studio, the safe, but non-portable, equivalent function is fscanf_s.
In general, for portability, you would use one of the stream classes, which, as you note, can be a pain to format correctly.

What's the difference between istringstream, ostringstream and stringstream? / Why not use stringstream in every case?

When would I use std::istringstream, std::ostringstream and std::stringstream and why shouldn't I just use std::stringstream in every scenario (are there any runtime performance issues?).
Lastly, is there anything bad about this (instead of using a stream at all):
std::string stHehe("Hello ");
stHehe += "stackoverflow.com";
stHehe += "!";
Personally, I find it very rare that I want to perform streaming into and out of the same string stream.
Usually I want to either initialize a stream from a string and then parse it; or stream things to a string stream and then extract the result and store it.
If you're streaming to and from the same stream, you have to be very careful with the stream state and stream positions.
Using 'just' istringstream or ostringstream better expresses your intent and gives you some checking against silly mistakes such as accidental use of << vs >>.
There might be some performance improvement but I wouldn't be looking at that first.
There's nothing wrong with what you've written. If you find it doesn't perform well enough, then you could profile other approaches, otherwise stick with what's clearest. Personally, I'd just go for:
std::string stHehe( "Hello stackoverflow.com!" );
A stringstream is somewhat larger, and might have slightly lower performance -- multiple inheritance can require an adjustment to the vtable pointer. The main difference is (at least in theory) better expressing your intent, and preventing you from accidentally using >> where you intended << (or vice versa). OTOH, the difference is sufficiently small that especially for quick bits of demonstration code and such, I'm lazy and just use stringstream. I can't quite remember the last time I accidentally used << when I intended >>, so to me that bit of safety seems mostly theoretical (especially since if you do make such a mistake, it'll almost always be really obvious almost immediately).
Nothing at all wrong with just using a string, as long as it accomplishes what you want. If you're just putting strings together, it's easy and works fine. If you want to format other kinds of data though, a stringstream will support that, and a string mostly won't.
In most cases, you won't find yourself needing both input and output on the same stringstream, so using std::ostringstream and std::istringstream explicitly makes your intention clear. It also prevents you from accidentally typing the wrong operator (<< vs >>).
When you need to do both operations on the same stream you would obviously use the general purpose version.
Performance issues would be the least of your concerns here, clarity is the main advantage.
Finally there's nothing wrong with using string append as you have to construct pure strings. You just can't use that to combine numbers like you can in languages such as perl.
istringstream is for input, ostringstream for output. stringstream is input and output.
You can use stringstream pretty much everywhere.
However, if you give your object to another user, and it uses operator >> whereas you where waiting a write only object, you will not be happy ;-)
PS:
nothing bad about it, just performance issues.
std::ostringstream::str() creates a copy of the stream's content, which doubles memory usage in some situations. You can use std::stringstream and its rdbuf() function instead to avoid this.
More details here: how to write ostringstream directly to cout
To answer your third question: No, that's perfectly reasonable. The advantage of using streams is that you can enter any sort of value that's got an operator<< defined, while you can only add strings (either C++ or C) to a std::string.
Presumably when only insertion or only extraction is appropriate for your operation you could use one of the 'i' or 'o' prefixed versions to exclude the unwanted operation.
If that is not important then you can use the i/o version.
The string concatenation you're showing is perfectly valid. Although concatenation using stringstream is possible that is not the most useful feature of stringstreams, which is to be able to insert and extract POD and abstract data types.
Why open a file for read/write access if you only need to read from it, for example?
What if multiple processes needed to read from the same file?