Errors when iostringstream::write is called - c++

On this website, the description of the iostringstream::write function says that:
In case of error, the badbit flag is set
What could those errors be?

The obvious error when writing to a stringstream would be if the underlying stringbuffer failed to allocate memory to hold the data being written. Also note, however, that the link you've given is to ostream::write, which could fail for other reasons (e.g., writing to a pipe that's been closed or a file on a disk that's full and/or the write would exceed what the user's allowed).
Aside #1: there's no such thing as an iostringstream -- there's istringstream and ostringstream. The one that combines both is just stringstream.
Aside #2: cplusplus.com isn't particularly highly respected. Some other sites (e.g., cppreference.com) seem to be more dependable/accurate, at least as a general rule (though I feel obliged to point out that I don't use any of the above much myself, so my comments on them aren't anywhere close to the last word).

Related

A better replacement for istrstream?

istrstream was perfect for my needs - basically, take a fixed char buffer, and give me a simple way to extract lines getline() and test for eof()
I'm switching our projects to C++ 17 compliance - which has deprecated istrsteam - apparently because there are too many C++ programmers who cannot fathom fixed buffer memory management (are you serious?!)
At any rate, the istringstream provides the same use semantics, but it imposes the need to now copy the entire fixed character buffer at construction time.
This is an anti-pattern.
What I am looking for is either a way to use a string_view in place of a string for the istringstream, or alternately a better replacement for stringstream which itself handles externally managed fixed buffer (it need only point into it, it never need worry about managing that resource, just as strstream did).
Currently, in VS 2017, this is illegal, and if I understand things correctly, is illegal everywhere in the current state-of-art of C++ (I'm sure you'll correct me if I'm wrong!)
std::string_view raw_view(reinterpret_cast<const char *>(raw_buffer.get()), raw_buffer.size());
std::istringstream raw_stream(raw_view);
So - ideas?
Note: Peter Sommerlad has a proposal for this exact idea here for the C++ standards body:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0448r1.pdf
Continue using istrstream for the time being. It likely won't be removed until either P0448 (using std::span<char> as the source/destination of a stream buffer) or P0408 (the ability to move data into/outof stringstreams) is adopted by the standard. Either of those would serve your needs well.
That being said, if all you're trying to do is get substrings between \ns, it would be far more efficient (even with the above proposals) to just use a regex search. Or just a regular search, since you're just looking for \n. That would give you a pair of iterators that represents a line. Using iostreams for line-by-line processing of an already-loaded character buffer is overkill and will never be as efficient as the alternative.

C++ how to check if the std::cin buffer is empty

The title is misleading because I'm more interested in finding an alternate solution. My gut feeling is that checking whether the buffer is empty is not the most ideal solution (at least in my case).
I'm new to C++ and have been following Bjarne Stroustrup's Programming Principles and Practices using C++. I'm currently on Chapter 7, where we are "refining" the calculator from Chapter 6. (I'll put the links for the source code at the end of the question.)
Basically, the calculator can take multiple inputs from the user, delimited by semi-colons.
> 5+2; 10*2; 5-1;
= 7
> = 20
> = 4
>
But I'd like to get rid of the prompt character ('>') for the last two answers, and display it again only when the user input is asked for. My first instinct was to find a way to check if the buffer is empty, and if so, cout the character and if not, proceed with couting the answer. But after a bit of googling I realized the task is not as easy as I initially thought... And also that maybe that wasn't a good idea to begin with.
I guess essentially my question is how to get rid of the '>' characters for the last two answers when there are multiple inputs. But if checking the cin buffer is possible and is not a bad idea after all, I'd love to know how to do it.
Source code: https://gist.github.com/Spicy-Pumpkin/4187856492ccca1a24eaa741d7417675
Header file: http://www.stroustrup.com/Programming/PPP2code/std_lib_facilities.h
^ You need this header file. I assume it is written by the author himself.
Edit: I did look around the web for some solutions, but to be honest none of them made any sense to me. It's been like 4 days since I picked up C++ and I have a very thin background in programming, so sometimes even googling is a little tough..
As you've discovered, this is a deceptively complicated task. This is because there are multiple issues here at play, both the C++ library, and the actual underlying file.
C++ library
std::cin, and C++ input streams, use an intermediate buffer, a std::streambuf. Input from the underlying file, or an interactive terminal, is not read character by character, but rather in moderately sized chunks, where possible. Let's say:
int n;
std::cin >> n;
Let's say that when this is done and over is, n contains the number 42. Well, what actually happened is that std::cin, more than likely, did not read just two characters, '4' and '2', but whatever additional characters, beyond that, were available on the std::cin stream. The remaining characters were stored in the std::streambuf, and the next input operation will read them, before actually reading the underlying file.
And it is equally likely that the above >> did not actually read anything from the file, but rather fetched the '4' and the '2' characters from the std::streambuf, that were left there after the previous input operation.
It is possible to examine the underlying std::streambuf, and determine whether there's anything unread there. But this doesn't really help you.
If you were about to execute the above >> operator, you looked at the underlying std::streambuf, and discover that it contains a single character '4', that also doesn't tell you much. You need to know what the next character is in std::cin. It could be a space or a newline, in which case all you'll get from the >> operator is 4. Or, the next character could be '2', in which case >> will swallow at least '42', and possibly more digits.
You can certainly implement all this logic yourself, look at the underlying std::streambuf, and determine whether it will satisfy your upcoming input operation. Congratulations: you've just reinvented the >> operator. You might as well just parse the input, a character at a time, yourself.
The underlying file
You determined that std::cin does not have sufficient input to satisfy your next input operation. Now, you need to know whether or not input is available on std::cin.
This now becomes an operating system-specific subject matter. This is no longer covered by the standard C++ library.
Conclusion
This is doable, but in all practical situations, the best solution here is to use an operating system-specific approach, instead of C++ input streams, and read and buffer your input yourself. On Linux, for example, the classical approach is to set fd 0 to non-blocking mode, so that read() does not block, and to determine whether or not there's available input, just try read() it. If you did read something, put it into a buffer that you can look at later. Once you've consumed all previously-read buffered input, and you truly need to wait for more input to be read, poll() the file descriptor, until it's there.

Optimal way of reading a complete file to a string using fstream?

Many other posts, like " Read whole ASCII file into C++ std::string " explain what some of the options are but do not describe pro and cons of various methods in any depth. I want to know why one method is preferable over another?
All of these use std::fstream to read the file into a std::string. I am unsure what the costs and benefits of each method. Lets assume this is for the common case where the read files are known to be of some smallish size memory can easily accommodate, clearly reading a multi-terrabyte file into an memory is a bad idea no matter how you do it.
The most common way after a few googles searches to read a whole file into an std::string involves using std::getline and appending a newline character to it after each line. This seems needless to me, but is there some performance or compatibility reason that this is ideal?
std::string Results;
std::ifstream ResultReader("file.txt");
while(ResultReader)
{
std::getline(ResultReader, Results);
Results.push_back('\n');
}
Another way I pieced together is to change the getline delimiter so it is something not in the file. The EOF char is seems unlikely to be in the middle of the file so that seems a likely candidate. This includes a cast so there is at least one reason not to do it, but this does read a file at once with no string concatenation. Presumably there is still some cost for the delimiter checks. Are there any other good reasons not to do this?
std::string Results;
std::ifstream ResultReader("file.txt");
std::getline(ResultReader, Results, (char)std::char_traits<char>::eof());
The cast means that on systems that define std::char_traits::eof() as something other than -1 might have problems. Is this a practical reason to not choose this over other methods that use std::getline and string::push_pack('\n').
How does these compare to other ways of reading the file at once like in this question: Read whole ASCII file into C++ std::string
std::ifstream ResultReader("file.txt");
std::string Results((std::istreambuf_iterator<char>(ResultReader)),
std::istreambuf_iterator<char>());
It would seem this would be best. It offloads almost all the work onto the standard library which ought to be heavily optimized for the given platform. I see no reason for checks other than stream validity and the end of the file. Is this ideal or are there problems with this that are unseen.
Does the standard or do details of some implementation provide reasons to prefer some method over another? Have I missed some method that might prove ideal in a wide variety of circumstances?
What is a simplest, most idiomatic, best performing and standard compliant way of reading a whole file into an std::string?
EDIT - 2
This question has prompted me to write a small suite of benchmarks. They are MIT license and available on github at: https://github.com/Sqeaky/CppFileToStringExperiments
Fastest - TellSeekRead and CTellSeekRead- These have the system provide an easy to get the size and reads the file in one go.
Faster - Getline Appending and Eof - The checking of chars does not seem to impose any cost.
Fast - RdbufMove and Rdbuf - The std::move seems to make no difference in release.
Slow - Iterator, BackInsertIterator and AssignIterator - Something is wrong with iterators and input streams. The work great in memory, but not here. That said some of these are faster than others.
I have added every method suggested so far, including those in links. I would appreciate if someone could run this on windows and with other compilers. I currently do not have access to a machine with NTFS and it has been noted that this and compiler details could be important.
As for measuring simplicity and idiomatic-ness how do we measure these objectively? Simplicity seems doable, perhaps use something line LOCs and Cyclomatic complexity, but how idiomatic something is seems purely subjective.
What is a simplest, most idiomatic, best performing and standard
compliant way of reading a whole file into an std::string?
those are pertty much contradicting requests, one most likely to lessen the other. simpler code won't be the fastest, or more idiomatic.
after exploring this area for a while I've come to some conclusions:
1) the most performance penalty causing is the IO action itself - the less IO actions taken - the fastest the code
2) memory allocations also quite expensive, but not as expensive as the IO
3) reading as binary is faster than reading as text
4) using the OS API will probably be faster than C++ streams
5) std::ios_base::sync_with_stdio doesn't really effect the performence, it's an urban legend.
using std::getline is probably not the best choice if performence is needed because of these reasons: it will make N IO actions and N allocations for N lines.
A compromise which is fast, standard and elegant is to get the file size, allocate all the memory in one time, then reading the file in one time:
std::ifstream fileReader(<your path here>,std::ios::binary|std::ios::ate);
if (fileReader){
auto fileSize = fileReader.tellg();
fileReader.seekg(std::ios::beg);
std::string content(fileSize,0);
fileReader.read(&content[0],fileSize);
}
move the content around to prevent un-needed copies.
This website has a good comparison on several different methods for doing that. The one I currently use is:
std::string read_sequence() {
std::ifstream f("sequence.fasta");
std::ostringstream ss;
ss << f.rdbuf();
return ss.str();
}
If your text files are separated by newlines, this will keep them. If you want to remove that, for instance (which is my case most of the times), you can just add a call to something such as
auto s = ss.str();
s.erase(std::remove_if(s.begin(), s.end(),
[](char c) { return c == '\n'; }), s.end());
There are two big difficulties with your question. First, the Standard doesn't mandate any particular implementation (yes, nearly everybody started with the same implementation; but they've been modifying it over time, and the optimal I/O code for NTFS, say, will be different than the optimal I/O code for ext4), so it is possible (although somewhat unlikely) for a particular approach to be fastest on one platform, but not another. Second, there's a little difficulty in defining "optimal"; I assume you mean "fastest," but that's not necessarily the case.
There are approaches that are idiomatic, and perfectly fine C++, but unlikely to give wonderful performance. If your goal is to end up with a single std::string, using std::getline(std::ostream&, std::string&) very likely to be slower than necessary. The std::getline() call has to look for the '\n', and you'll occasionally reallocate and copy the destination std::string. Even so, it's ridiculously simple, and easy to understand. That could be optimal from a maintenance perspective, assuming you don't need the absolute fastest performance possible. This will also be a good approach if you don't need the whole file in one giant std::string at one time. You'll be very frugal with memory.
An approach that is likely more efficient is to manipulate the read buffer:
std::string read_the_whole_file(std::ostream& ostr)
{
std::ostringstream sstr;
sstr << ostr.rdbuf();
return sstr.str();
}
Personally, I'm just as likely to use std::fopen() and std::fread() (and std::unique_ptr<FILE>) because, on Windows at least, you'll get a better error message when std::fopen() fails than when constructing a file stream object fails. I consider the better error message an important factor when deciding which approach is optimal.

scanf on an istream object

NOTE: I've seen the post What is the cin analougus of scanf formatted input? before asking the question and the post doesn't solve my problem here. The post seeks for C++-way to do it, but as I mentioned already, it is inconvenient to just use C++-way to do it sometimes and I have clear examples for that.
I am trying to read data from an istream object, and sometimes it is inconvenient to just use C++-style ways such as operator>>, e.g. the data are in special form 123:456 so you have to imbue to make ':' as space (which is very hacky, as opposed to %d:%d in scanf), or 00123 where you want to read as string and convert decimal instead of octal (as opposed to %d in scanf), and possibly many other cases.
The reason I chose istream as interface is because it can be derived and therefore more flexible. For example, we can create in-memory streams, or some customized streams that generated on the fly, etc. C-style FILE*, on the other hand, is very limited, at least in a standard-compliant way, on creating customized streams.
So my questions is, is there a way to do scanf-like data extraction on istream object? I think fscanf internally read character by character from FILE* using fgetc, while istream also provides such interface. So it is possible by just copying and pasting the code of fscanf and replace the FILE* with the istream object, but that's very hacky. Is there a smarter and cleaner way, or is there some existing work on this?
Thanks.
You should never, under any circumstances, use scanf or its relatives for anything, for three reasons:
Many format strings, including for instance all the simple uses of %s, are just as dangerous as gets.
It is almost impossible to recover from malformed input, because scanf does not tell you how far in characters into the input it got when it hit something unexpected.
Numeric overflow triggers undefined behavior: yes, that means scanf is allowed to crash the entire program if a numeric field in the input has too many digits.
Prior to C++11, the C++ specification defined istream formatted input of numbers in terms of scanf, which means that last objection is very likely to apply to them as well! (In C++11 the specification is changed to use strto* instead and to do something predictable if that detects overflow.)
What you should do instead is: read entire lines of input into std::string objects with getline, hand-code logic to split them up into fields (I don't remember off the top of my head what the C++-string equivalent of strsep is, but I'm sure it exists) and then convert numeric strings to machine numbers with the strtol/strtod family of functions.
I cannot emphasize this enough: THE ONLY 100% RELIABLE WAY TO CONVERT STRINGS TO NUMBERS IN C OR C++, unless you are lucky enough to have a C++ runtime that is already C++11-conformant in this regard, IS WITH THE strto* FUNCTIONS, and you must use them correctly:
errno = 0;
result = strtoX(s, &ends, 10); // omit 10 for floats
if (s == ends || *ends || errno)
parse_error();
(The OpenBSD manpages, linked above, explain why you have to do this fairly convoluted thing.)
(If you're clever, you can use ends and some manual logic to skip that colon, instead of strsep.)
I do not recommend you to mix C++ input output and C input output. No that they are really incompatible but they could just plain interoperate wrong.
For example Oracle docs recommend not to mix it http://www.oracle.com/technetwork/articles/servers-storage-dev/mixingcandcpluspluscode-305840.html
But no one stops you from reading data into the buffer and parsing it with standard c functions like sscanf.
...
string curString;
int a, b;
...
std::getline(inputStream, curString);
int sscanfResult == sscanf(curString.cstr(), "%d:%d", &a, &b);
if (2 != sscanfResult)
throw "error";
...
But it won't help in some situations when your stream is just one long contiguous sequence of symbols(like some string turned into memory stream).
Making your own fscanf from scratch or porting(?) the original CRT function actually isn't the worst possible idea. Just make sure you have tested it thoroughly(low level custom char manipulation was always a source of pain in C).
I've never really tried the boost\spirit and such parsing infrastructure could really be an overkill for your project. But boost libraries are usually well tested and designed. You could at least try to use it.
Based on #tmyklebu's comment, I implemented streamScanf which wraps istream as FILE* via fopencookie: https://github.com/likan999/codejam/blob/master/Common/StreamScanf.cpp

Is there a 'catch' with FastFormat?

I just read about the FastFormat C++ i/o formatting library, and it seems too good to be true: Faster even than printf, typesafe, and with what I consider a pleasing interface:
// prints: "This formats the remaining arguments based on their order - in this case we put 1 before zero, followed by 1 again"
fastformat::fmt(std::cout, "This formats the remaining arguments based on their order - in this case we put {1} before {0}, followed by {1} again", "zero", 1);
// prints: "This writes each argument in the order, so first zero followed by 1"
fastformat::write(std::cout, "This writes each argument in the order, so first ", "zero", " followed by ", 1);
This looks almost too good to be true. Is there a catch? Have you had good, bad or indifferent experiences with it?
Is there a 'catch' with FastFormat?
Last time I checked, there was one annoying catch:
You can only use either the narrow string version or the wide string version of this library. (The functions for wchar_t and char are the same -- which type is used is a compile time switch.)
With iostreams, stdio or Boost.Format you can use both.
Found one "catch", though for most people it will never manifest. From the project page:
Atomic operation. It doesn't write out statement elements one at a time, like the IOStreams, so has no atomicity issues
The only way I can see this happening is if it buffers the whole write() call's output itself, then writes it out to the ostream in one step. This means it needs to allocate memory, and if an object passed into the write() call produces a lot of output (several megabytes or more), it can consume up to twice that much memory in internal buffers (assuming it uses the grow-a-buffer-by-doubling-its-size-each-time trick).
If you're just using it for logging, and not, say, dumping huge amounts of XML, you'll never see this problem.
The only other "catch" I'm seeing is:
Highly portable. It will work with all good modern C++ compilers; it even works with Visual C++ 6!
So it won't work with an old C++ compiler, like cfront, whereas iostreams is backward compatible to the late 80's. Again, I'd be surprised if anyone ever had a problem with this.
Although FastFormat is a good library there are a number of issues with it:
Limited formatting support, in particular the following features are not supported:
Leading zeros (or any other non-space padding)
Octal/hexadecimal encoding
Runtime width/alignment specification
The library is quite big for a relatively small task of formatting and has even bigger dependency (STLSoft).
It looks pretty interesting indeed! Good tip regardless, and +1 for that!
I've been playing with it for a bit. The main drawback I see is that FastFormat supports less formatting options for the output. This is I think a direct consequence of the way the higher typesafety is achieved, and a good tradeoff depending on your circumstances.
If you look in detail at his performance benchmark page, you'll notice that good old C printf-family functions are still winning on Linux. In fact, the only test case where they perform poorly is the test case that should be static string concatenations, where I would expect printf to be wasteful. Moreover, GCC provides static type-checking on printf-style function calls, so the benefit of type-safety is reduced. So: if you are running on Linux and if you need the absolute best performance, FastFormat is probably not the optimal solution.
The library depends on a couple of environment variables, as mentioned in the docs.
That might be no biggie to some people, but I'd prefer my code to be as self-contained as possible. If I check it out from source control, it should work and compile. It won't, if it requires you to set environment variables.