is it good to use sscanf for parsing string - c++

I have been using sscanf() in my parser to get some css like tokens such as color code some variations below;
#FDC69A
#ff0
orange
Example code will be;
int r g b;
cosnt char* s = "#FAFAFA";
if(sscanf(s, "#%02x%02x%02x", &r, &g, &b) == 3){
// color code ok
}
My preferred language for current project is c++, I think sscanf can be faster than regular character by character parsing and overall code will be bug free & minimal still it may have portability issues across different compilers.
A thing I noticed is, popular of open source project do not use sscanf for tokenizing input buffers instead they do it char by char, it is a bad programming practice to use sscanf in parsing that i am following?

The biggest problem with sscanf (as well as scanf and fscanf) is that numeric overflow causes undefined behavior. For example:
const char *s = "999999999999999999999999999999";
int n;
sscanf(s, "%d", &n);
The C standard says exactly nothing about how this code behaves. It might set n to some arbitrary value, it might report an error, or it might crash.
(In practice, existing implementations are likely to behave sensibly, for some value of "sensibly".)

if(sscanf(s, "#%02x%02x%02x", &r, &g, &b) == 3) is robust... nothing to worry about there.
Historically, the big concern with those functions was that someone might specify a format flag that doesn't match the argument (e.g. %d not given an int*)... many modern compilers have enough validation to avoid accidents like that.
Still, C++ has iostreams, and people tend to use those for many I/O and parsing operations as the stream destructors automatically flush and close files and release descriptors, they're type safe, extensible to user-defined types, you can generally reuse parsing/output code for any type of stream, and they're often convenient. They'd be significantly more tedious for your specific test above though.
If you've noticed lots of OSS programs scanning character by character, it may be because:
They're doing more complex parsing - where they want to branch to different parsing logic after reading individual characters, or
In your code you have a firm expectation of what to expect, so it's reasonable to do a sscanf to test that, but if you were writing say a compiler it'd be too slow to try a huge if/else list of hundreds sscanf attempts to recognise tokens.
Relevant for scanf, fscanf but not sscanf - avoid scanning too far so they can ungetc, which (from memory) is only portably guaranteed to work for 1 character.

Related

scanf on an istream object

NOTE: I've seen the post What is the cin analougus of scanf formatted input? before asking the question and the post doesn't solve my problem here. The post seeks for C++-way to do it, but as I mentioned already, it is inconvenient to just use C++-way to do it sometimes and I have clear examples for that.
I am trying to read data from an istream object, and sometimes it is inconvenient to just use C++-style ways such as operator>>, e.g. the data are in special form 123:456 so you have to imbue to make ':' as space (which is very hacky, as opposed to %d:%d in scanf), or 00123 where you want to read as string and convert decimal instead of octal (as opposed to %d in scanf), and possibly many other cases.
The reason I chose istream as interface is because it can be derived and therefore more flexible. For example, we can create in-memory streams, or some customized streams that generated on the fly, etc. C-style FILE*, on the other hand, is very limited, at least in a standard-compliant way, on creating customized streams.
So my questions is, is there a way to do scanf-like data extraction on istream object? I think fscanf internally read character by character from FILE* using fgetc, while istream also provides such interface. So it is possible by just copying and pasting the code of fscanf and replace the FILE* with the istream object, but that's very hacky. Is there a smarter and cleaner way, or is there some existing work on this?
Thanks.
You should never, under any circumstances, use scanf or its relatives for anything, for three reasons:
Many format strings, including for instance all the simple uses of %s, are just as dangerous as gets.
It is almost impossible to recover from malformed input, because scanf does not tell you how far in characters into the input it got when it hit something unexpected.
Numeric overflow triggers undefined behavior: yes, that means scanf is allowed to crash the entire program if a numeric field in the input has too many digits.
Prior to C++11, the C++ specification defined istream formatted input of numbers in terms of scanf, which means that last objection is very likely to apply to them as well! (In C++11 the specification is changed to use strto* instead and to do something predictable if that detects overflow.)
What you should do instead is: read entire lines of input into std::string objects with getline, hand-code logic to split them up into fields (I don't remember off the top of my head what the C++-string equivalent of strsep is, but I'm sure it exists) and then convert numeric strings to machine numbers with the strtol/strtod family of functions.
I cannot emphasize this enough: THE ONLY 100% RELIABLE WAY TO CONVERT STRINGS TO NUMBERS IN C OR C++, unless you are lucky enough to have a C++ runtime that is already C++11-conformant in this regard, IS WITH THE strto* FUNCTIONS, and you must use them correctly:
errno = 0;
result = strtoX(s, &ends, 10); // omit 10 for floats
if (s == ends || *ends || errno)
parse_error();
(The OpenBSD manpages, linked above, explain why you have to do this fairly convoluted thing.)
(If you're clever, you can use ends and some manual logic to skip that colon, instead of strsep.)
I do not recommend you to mix C++ input output and C input output. No that they are really incompatible but they could just plain interoperate wrong.
For example Oracle docs recommend not to mix it http://www.oracle.com/technetwork/articles/servers-storage-dev/mixingcandcpluspluscode-305840.html
But no one stops you from reading data into the buffer and parsing it with standard c functions like sscanf.
...
string curString;
int a, b;
...
std::getline(inputStream, curString);
int sscanfResult == sscanf(curString.cstr(), "%d:%d", &a, &b);
if (2 != sscanfResult)
throw "error";
...
But it won't help in some situations when your stream is just one long contiguous sequence of symbols(like some string turned into memory stream).
Making your own fscanf from scratch or porting(?) the original CRT function actually isn't the worst possible idea. Just make sure you have tested it thoroughly(low level custom char manipulation was always a source of pain in C).
I've never really tried the boost\spirit and such parsing infrastructure could really be an overkill for your project. But boost libraries are usually well tested and designed. You could at least try to use it.
Based on #tmyklebu's comment, I implemented streamScanf which wraps istream as FILE* via fopencookie: https://github.com/likan999/codejam/blob/master/Common/StreamScanf.cpp

How to fix fprintf vulnerability?

In my code I used fprintf. I used flawfinder to check the code for vulnerabilities and I got that:
358: [4] (format) fprintf: If format strings can be influenced by
an attacker, they can be exploited. Use a constant for the format
specification.
Can someone explain to me what Use a constant for the format specification actually means? Is there any safe version of fprintf?
The problem is that fprintf determines how many arguments it should get by examining the format string. If the format string doesn't agree with the actual arguments, you have undefined behavior which can manifest as a security vulnerability.
The problem is particularly bad if the string supplied can be influenced by the user of your program, because he can then specifically design the string to make your program do bad things.
There is no safe version of fprintf in the C standard. C++ streams avoid the problem, at the cost of not having format strings and using a far more verbose syntax for specifying formatting options.
A constant string, as in a string literal.
Like in
fprintf(someFile, "%s", someStringVariable);
and not like
fprintf(someFile, someStringVariable);
It means it wants you to write:
fprintf(out, "foo %s", some_string);
instead of what you have, which I guess is something like:
const char *format = "foo %s";
/* some time later */
fprintf(out, format, some_string);
The reason is that it's worried format might come from user input or something, and a malicious user could supply a format foo %s%s%s in order to provoke undefined behavior that they may be able to exploit.
Obviously if you're choosing between n different format strings, all of which are string literals in your code and all use the same format specifiers, but you choose which one at runtime, then following this advice is a bit awkward and wouldn't make your code any safer. But you could have n functions instead of n strings, and each function calls fprintf with a different string literal.
If you're reading the format string out of a config file (which is one fairly crude way of implementing internationalization from scratch) then you're basically out of luck. The linter doesn't trust your translator to use the right format codes for the arguments supplied to the call. And arguably neither should you :-)

Refactoring C-style pretty-printing into C++-style pretty-printing

I want to refactor some printf/sprintf/fprintf statements into ostream/sstream/fstream statements. The code in question pretty-prints a series of integers and floating-point numbers, using whitespace padding and fixed numbers of decimal points.
It seems to me that this would be a good candidate for a Martin Fowler style writeup of a safe, step-by-step refactorings, with important gotchas noted. The first step, of course, is to get the legacy code into a test harness, which I have done.
What slow and careful steps can I take to perform this refactoring?
If refactoring is not the goal in itself, you can avoid it altogether (well, almost) by using a formatting library such as tinyformat which provides an interface similar to printf but is type safe and uses IOStreams internally.
Basic mechanics of the conversion:
Convert each printf-style clause %w.pf or %w.pe, where w is the field width and p is the number of digits of precision, into << setw(w) << setprecision(p) << fixed.
Convert each printf-style clause %wd or %wi, where w is the field width, into << setw(w).
Convert "\n" to endl where appropriate.
Process for printf:
Create a char[] (let's call it text) with enough total width.
Convert the printf(...) to sprintf(text, ...), and use cout << text to actually print the text.
Complete using the common instructions.
Process for fprintf:
Same as printf, but use the appropriate fstream instead of cout.
If you already have an opened C-style FILE object that you do not want to refactor at this time, it gets a little sticky (but can be done).
Complete using the common instructions.
Process for sprintf:
If the string being written to is only used to output to a stream in the current context, refer to one of the two refactorings above.
Otherwise, begin by creating a stringstream and streaming the contents of the char[] you are writing to into that. If you are still intending to extract a char* from it, you can do std::stringstream::str().c_str().
Complete using the common instructions.
Common instructions:
Convert each clause one by one into C++-style.
Remove *printf and char[] declarations as necessary when finished.
Apply other refactorings, particularly "Extract Method" (Fowler, Refactoring) as necessary.

How do you cope with signed char -> int issues with standard library?

This is a really long-standing issue in my work, that I realize I still don't have a good solution to...
C naively defined all of its character test functions for an int:
int isspace(int ch);
But char's are often signed, and a full character often doesn't fit in an int, or in any single storage-unit that used for strings******.
And these functions have been the logical template for current C++ functions and methods, and have set the stage for the current standard library. In fact, they're still supported, afaict.
So if you hand isspace(*pchar) you can end up with sign extension problems. They're hard to see, and thence they're hard to guard against in my experience.
Similarly, because isspace() and it's ilk all take ints, and because the actual width of a character is often unknown w/o string-analysis - meaning that any modern character library should essentially never be carting around char's or wchar_t's but only pointers/iterators, since only by analyzing the character stream can you know how much of it composes a single logical character, I am at a bit of a loss as to how best to approach the issues?
I keep expecting a genuinely robust library based around abstracting away the size-factor of any character, and working only with strings (providing such things as isspace, etc.), but either I've missed it, or there's another simpler solution staring me in the face that all of you (who know what you're doing) use...
** These issues don't come up for fixed-sized character-encodings that can wholly contain a full character - UTF-32 apparently is about the only option that has these characteristics (or specialized environments that restrict themselves to ASCII or some such).
So, my question is:
"How do you test for whitespace, isprintable, etc., in a way that doesn't suffer from two issues:
1) Sign expansion, and
2) variable-width character issues
After all, most character encodings are variable-width: UTF-7, UTF-8, UTF-16, as well as older standards such as Shift-JIS. Even extended ASCII can have the simple sign-extension problem if the compiler treats char as a signed 8 bit unit.
Please note:
No matter what size your char_type is, it's wrong for most character encoding schemes.
This problem is in the standard C library, as well as in the C++ standard libraries; which still tries to pass around char and wchar_t, rather than string-iterators in the various isspace, isprint, etc. implementations.
Actually, it's precisely those type of functions that break the genericity of std::string. If it only worked in storage-units, and didn't try to pretend to understand the meaning of the storage-units as logical characters (such as isspace), then the abstraction would be much more honest, and would force us programmers to look elsewhere for valid solutions...
Thank You
Everyone who participated. Between this discussion and WChars, Encodings, Standards and Portability I have a much better handle on the issues. Although there are no easy answers, every bit of understanding helps.
How do you test for whitespace, isprintable, etc., in a way that doesn't suffer from two issues:
1) Sign expansion
2) variable-width character issues
After all, all commonly used Unicode encodings are variable-width, whether programmers realize it or not: UTF-7, UTF-8, UTF-16, as well as older standards such as Shift-JIS...
Obviously, you have to use a Unicode-aware library, since you've demonstrated (correctly) that C++03 standard library is not. The C++11 library is improved, but still not quite good enough for most usages. Yes, some OS' have a 32-bit wchar_t which makes them able to correctly handle UTF32, but that's an implementation, and is not guaranteed by C++, and is not remotely sufficient for many unicode tasks, such as iterating over Graphemes (letters).
IBMICU
Libiconv
microUTF-8
UTF-8 CPP, version 1.0
utfproc
and many more at http://unicode.org/resources/libraries.html.
If the question is less about specific character testing and more about code practices in general: Do whatever your framework does. If you're coding for linux/QT/networking, keep everything internally in UTF-8. If you're coding with Windows, keep everything internally in UTF-16. If you need to mess with code points, keep everything internally in UTF-32. Otherwise (for portable, generic code), do whatever you want, since no matter what, you have to translate for some OS or other anyway.
I think you are confounding a whole host of unrelated concepts.
First off, char is simply a data type. Its first and foremost meaning is "the system's basic storage unit", i.e. "one byte". Its signedness is intentionally left up to the implementation so that each implementation can pick the most appropriate (i.e. hardware-supported) version. It's name, suggesting "character", is quite possibly the single worst decision in the design of the C programming language.
The next concept is that of a text string. At the foundation, text is a sequence of units, which are often called "characters", but it can be more involved than that. To that end, the Unicode standard coins the term "code point" to designate the most basic unit of text. For now, and for us programmers, "text" is a sequence of code points.
The problem is that there are more codepoints than possible byte values. This problem can be overcome in two different ways: 1) use a multi-byte encoding to represent code point sequences as byte sequences; or 2) use a different basic data type. C and C++ actually offer both solutions: The native host interface (command line args, file contents, environment variables) are provided as byte sequences; but the language also provides an opaque type wchar_t for "the system's character set", as well as translation functions between them (mbstowcs/wcstombs).
Unfortunately, there is nothing specific about "the system's character set" and "the systems multibyte encoding", so you, like so many SO users before you, are left puzzling what to do with those mysterious wide characters. What people want nowadays is a definite encoding that they can share across platforms. The one and only useful encoding that we have for this purpose is Unicode, which assigns a textual meaning to a large number of code points (up to 221 at the moment). Along with the text encoding comes a family of byte-string encodings, UTF-8, UTF-16 and UTF-32.
The first step to examining the content of a given text string is thus to transform it from whatever input you have into a string of definite (Unicode) encoding. This Unicode string may itself be encoded in any of the transformation formats, but the simplest is just as a sequence of raw codepoints (typically UTF-32, since we don't have a useful 21-bit data type).
Performing this transformation is already outside the scope of the C++ standard (even the new one), so we need a library to do this. Since we don't know anything about our "system's character set", we also need the library to handle that.
One popular library of choice is iconv(); the typical sequence goes from input multibyte char* via mbstowcs() to a std::wstring or wchar_t* wide string, and then via iconv()'s WCHAR_T-to-UTF32 conversion to a std::u32string or uint32_t* raw Unicode codepoint sequence.
At this point our journey ends. We can now either examine the text codepoint by codepoint (which might be enough to tell if something is a space); or we can invoke a heavier text-processing library to perform intricate textual operations on our Unicode codepoint stream (such as normalization, canonicalization, presentational transformation, etc.). This is far beyond the scope of a general-purpose programmer, and the realm of text processing specialists.
It is in any case invalid to pass a negative value other than EOF to isspace and the other character macros. If you have a char c, and you want to test whether it is a space or not, do isspace((unsigned char)c). This deals with the extension (by zero-extending). isspace(*pchar) is flat wrong -- don't write it, don't let it stand when you see it. If you train yourself to panic when you do see it, then it's less hard to see.
fgetc (for example) already returns either EOF or a character read as an unsigned char and then converted to int, so there's no sign-extension issue for values from that.
That's trivia really, though, since the standard character macros don't cover Unicode, or multi-byte encodings. If you want to handle Unicode properly then you need a Unicode library. I haven't looked into what C++11 or C1X provide in this regard, other than that C++11 has std::u32string which sounds promising. Prior to that the answer is to use something implementation-specific or third-party. (Un)fortunately there are a lot of libraries to choose from.
It may be (I speculate) that a "complete" Unicode classification database is so large and so subject to change that it would be impractical for the C++ standard to mandate "full" support anyway. It depends to an extent what operations should be supported, but you can't get away from the problem that Unicode has been through 6 major versions in 20 years (since the first standard version), while C++ has had 2 major versions in 13 years. As far as C++ is concerned, the set of Unicode characters is a rapidly-moving target, so it's always going to be implementation-defined what code points the system knows about.
In general, there are three correct ways to handle Unicode text:
At all I/O (including system calls that return or accept strings), convert everything between an externally-used character encoding, and an internal fixed-width encoding. You can think of this as "deserialization" on input and "serialization" on output. If you had some object type with functions to convert it to/from a byte stream, then you wouldn't mix up byte stream with the objects, or examine sections of byte stream for snippets of serialized data that you think you recognize. It needn't be any different for this internal unicode string class. Note that the class cannot be std::string, and might not be std::wstring either, depending on implementation. Just pretend the standard library doesn't provide strings, if it helps, or use a std::basic_string of something big as the container but a Unicode-aware library to do anything sophisticated. You may also need to understand Unicode normalization, to deal with combining marks and such like, since even in a fixed-width Unicode encoding, there may be more than one code point per glyph.
Mess about with some ad-hoc mixture of byte sequences and Unicode sequences, carefully tracking which is which. It's like (1), but usually harder, and hence although it's potentially correct, in practice it might just as easily come out wrong.
(Special purposes only): use UTF-8 for everything. Sometimes this is good enough, for example if all you do is parse input based on ASCII punctuation marks, and concatenate strings for output. Basically it works for programs where you don't need to understand anything with the top bit set, just pass it on unchanged. It doesn't work so well if you need to actually render text, or otherwise do things to it that a human would consider "obvious" but actually are complex. Like collation.
One comment up front: the old C functions like isspace took int for
a reason: they support EOF as input as well, so they need to be able
to support one more value than will fit in a char. The
“naïve” decision was allowing char to be signed—but
making it unsigned would have had severe performance implications on a
PDP-11.
Now to your questions:
1) Sign expansion
The C++ functions don't have this problem. In C++, the
“correct” way of testing things like whether a character is
a space is to grap the std::ctype facet from whatever locale you want,
and to use it. Of course, the C++ localization, in <locale>, has
been carefully designed to make it as hard as possible to use, but if
you're doing any significant text processing, you'll soon come up with
your own convenience wrappers: a functional object which takes a locale
and mask specifying which characteristic you want to test isn't hard.
Making it a template on the mask, and giving its locale argument a
default to the global locale isn't rocket science either. Throw in a
few typedef's, and you can pass things like IsSpace() to std::find.
The only subtility is managing the lifetime of the std::ctype object
you're dealing with. Something like the following should work, however:
template<std::ctype_base::mask mask>
class Is // Must find a better name.
{
std::locale myLocale;
//< Needed to ensure no premature destruction of facet
std::ctype<char> const* myCType;
public:
Is( std::locale const& l = std::locale() )
: myLocale( l )
, myCType( std::use_facet<std::ctype<char> >( l ) )
{
}
bool operator()( char ch ) const
{
return myCType->is( mask, ch );
}
};
typedef Is<std::ctype_base::space> IsSpace;
// ...
(Given the influence of the STL, it's somewhat surprising that the
standard didn't define something like the above as standard.)
2) Variable width character issues.
There is no real answer. It all depends on what you need. For some
applications, just looking for a few specific single byte characters is
sufficient, and keeping everything in UTF-8, and ignoring the multi-byte
issues, is a viable (and simple) solution. Beyond that, it's often
useful to convert to UTF-32 (or depending on the type of text you're
dealing with, UTF-16), and use each element as a single code point. For
full text handling, on the other hand, you have to deal with
multi-code-point characters even if you're using UTF-32: the sequence
\u006D\u0302 is a single character (a small m with a circumflex over
it).
I haven't been testing internationalization capabilities of Qt library so much, but from what i know, QString is fully unicode-aware, and is using QChar's which are unicode-chars. I don't know internal implementation of those, but I expect that this implies QChar's to be varaible size characters.
It would be weird to bind yourself to such big framework as Qt just to use strings though.
You seem to be confusing a function defined on 7-bit ascii with a universal space-recognition function. Character functions in standard C use int not to deal with different encodings, but to allow EOF to be an out-of-band indicator. There are no issues with sign-extension, because the numbers these functions are defined on have no 8th bit. Providing a byte with this possibility is a mistake on your part.
Plan 9 attempts to solve this with a UTF library, and the assumption that all input data is UTF-8. This allows some measure of backwards compatibility with ASCII, so non-compliant programs don't all die, but allows new programs to be written correctly.
The common notion in C, even still is that a char* represents an array of letters. It should instead be seen as a block of input data. To get the letters from this stream, you use chartorune(). Each Rune is a representation of a letter(/symbol/codepoint), so one can finally define a function isspacerune(), which would finally tell you which letters are spaces.
Work with arrays of Rune as you would with char arrays, to do string manipulation, then call runetochar() to re-encode your letters into UTF-8 before you write it out.
The sign extension issue is easy to deal with. You can either use:
isspace((unsigned char) ch)
isspace(ch & 0xFF)
the compiler option that makes char an unsigned type
As far the variable-length character issue (I'm assuming UTF-8), it depends on your needs.
If you just to deal with the ASCII whitespace characters \t\n\v\f\r, then isspace will work fine; the non-ASCII UTF-8 code units will simply be treated as non-spaces.
But if you need to recognize the extra Unicode space characters \x85\xa0\u1680\u180e\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000, it's a bit more work. You could write a function along the lines of
bool isspace_utf8(const char* pChar)
{
uint32_t codePoint = decode_char(*pChar);
return is_unicode_space(codePoint);
}
Where decode_char converts a UTF-8 sequence to the corresponding Unicode code point, and is_unicode_space returns true for characters with category Z or for the Cc characters that are spaces. iswspace may or may not help with the latter, depending on how well your C++ library supports Unicode. It's best to use a dedicated Unicode library for the job.
most strings in practice use a multibyte encoding such as UTF-7,
UTF-8, UTF-16, SHIFT-JIS, etc.
No programmer would use UTF-7 or Shift-JIS as an internal representation unless they enjoy pain. Stick with ŬTF-8, -16, or -32, and only convert as needed.
Your preamble argument is somewhat inacurate, and arguably unfair, it is simply not in the library design to support Unicode encodings - certainly not multiple Unicode encodings.
Development of the C and C++ languages and much of the libraries pre-date the development of Unicode. Also as system's level languages they require a data type that corresponds to the smallest addressable word size of the execution environment. Unfortunately perhaps the char type has become overloaded to represent both the character set of the execution environment and the minimum addressable word. It is history that has shown this to be flawed perhaps, but changing the language definition and indeed the library would break a large amount of legacy code, so such things are left to newer languages such as C# that has an 8-bit byte and distinct char type.
Moreover the variable encoding of Unicode representations makes it unsuited to a built-in data type as such. You are obviously aware of this since you suggest that Unicode character operations should be performed on strings rather than machine word types. This would require library support and as you point out this is not provided by the standard library. There are a number of reasons for that, but primarily it is not within the domain of the standard library, just as there is no standard library support for networking or graphics. The library intrinsically does not address anything that is not generally universally supported by all target platforms from the deeply embedded to the super-computer. All such things must be provided by either system or third-party libraries.
Support for multiple character encodings is about system/environment interoperability, and the library is not intended to support that either. Data exchange between incompatible encoding systems is an application issue not a system issue.
"How do you test for whitespace, isprintable, etc., in a way that
doesn't suffer from two issues:
1) Sign expansion, and
2) variable-width character issues
isspace() considers only the lower 8-bits. Its definition explicitly states that if you pass an argument that is not representable as an unsigned char or equal to the value of the macro EOF, the results are undefined. The problem does not arise if it is used as it was intended. The problem is that it is inappropriate for the purpose you appear to be applying it to.
After all, all commonly used Unicode encodings are variable-width,
whether programmers realize it or not: UTF-7, UTF-8, UTF-16, as well
as older standards such as Shift-JIS
isspace() is not defined for Unicode. You'll need a library designed to use any specific encoding you are using. This question What is the best Unicode library for C? may be relevant.

Is there a 'catch' with FastFormat?

I just read about the FastFormat C++ i/o formatting library, and it seems too good to be true: Faster even than printf, typesafe, and with what I consider a pleasing interface:
// prints: "This formats the remaining arguments based on their order - in this case we put 1 before zero, followed by 1 again"
fastformat::fmt(std::cout, "This formats the remaining arguments based on their order - in this case we put {1} before {0}, followed by {1} again", "zero", 1);
// prints: "This writes each argument in the order, so first zero followed by 1"
fastformat::write(std::cout, "This writes each argument in the order, so first ", "zero", " followed by ", 1);
This looks almost too good to be true. Is there a catch? Have you had good, bad or indifferent experiences with it?
Is there a 'catch' with FastFormat?
Last time I checked, there was one annoying catch:
You can only use either the narrow string version or the wide string version of this library. (The functions for wchar_t and char are the same -- which type is used is a compile time switch.)
With iostreams, stdio or Boost.Format you can use both.
Found one "catch", though for most people it will never manifest. From the project page:
Atomic operation. It doesn't write out statement elements one at a time, like the IOStreams, so has no atomicity issues
The only way I can see this happening is if it buffers the whole write() call's output itself, then writes it out to the ostream in one step. This means it needs to allocate memory, and if an object passed into the write() call produces a lot of output (several megabytes or more), it can consume up to twice that much memory in internal buffers (assuming it uses the grow-a-buffer-by-doubling-its-size-each-time trick).
If you're just using it for logging, and not, say, dumping huge amounts of XML, you'll never see this problem.
The only other "catch" I'm seeing is:
Highly portable. It will work with all good modern C++ compilers; it even works with Visual C++ 6!
So it won't work with an old C++ compiler, like cfront, whereas iostreams is backward compatible to the late 80's. Again, I'd be surprised if anyone ever had a problem with this.
Although FastFormat is a good library there are a number of issues with it:
Limited formatting support, in particular the following features are not supported:
Leading zeros (or any other non-space padding)
Octal/hexadecimal encoding
Runtime width/alignment specification
The library is quite big for a relatively small task of formatting and has even bigger dependency (STLSoft).
It looks pretty interesting indeed! Good tip regardless, and +1 for that!
I've been playing with it for a bit. The main drawback I see is that FastFormat supports less formatting options for the output. This is I think a direct consequence of the way the higher typesafety is achieved, and a good tradeoff depending on your circumstances.
If you look in detail at his performance benchmark page, you'll notice that good old C printf-family functions are still winning on Linux. In fact, the only test case where they perform poorly is the test case that should be static string concatenations, where I would expect printf to be wasteful. Moreover, GCC provides static type-checking on printf-style function calls, so the benefit of type-safety is reduced. So: if you are running on Linux and if you need the absolute best performance, FastFormat is probably not the optimal solution.
The library depends on a couple of environment variables, as mentioned in the docs.
That might be no biggie to some people, but I'd prefer my code to be as self-contained as possible. If I check it out from source control, it should work and compile. It won't, if it requires you to set environment variables.