How to implement in C++ crossplatform snprintf? - c++

I wonder if it is possible and how to implement in C++ crossplatform, (C99, C++0x independent ) snprintf? Is there such in boost? (I wonder about what is the C++ idiom to replace snprintf(4)?)

std::ostringstream would be an alterntive to using snprintf:
char buf[1024];
snprintf(buf, 1024, "%d%s", 4, "hello");
Equivalent:
#include <sstream>
std::ostringstream s;
s << 4 << "hello";
// s.str().c_str(); // This returns `const char*` to constructed string.
There is also boost::lexical_cast:
std::string s = boost::lexical_cast<std::string>(4) +
boost::lexical_cast<std::string>("hello");

Yes, there is Boost Format library that supports formatting strings.

You might want to look at the Qt QString class, which provides a format function which does about what you want in a very OO sort of way. You could certainly copy and learn from it.
yes, it might be taboo to mention Qt in a question that was tagged boost, but the question seemed more generic than that.

Since Boost was mentionned, is there anything wrong with Boost.Format ?

Once I needed snprintf on Windows/Linux/HP-UX. I defined snprintf_safe and on Linux/HP-UX I made use of snprinf and on Windows I made use of _snprintf. I remember that _snprintf has a little bit different approach to writing '\0' if the number of bytes required to store the data exceeds the maximum allowed size. So It was necessary to handle. Anyway, it was this kind of macro:
#ifdef #WIN32
int snprintf_safe()
{
// make use of _snprintf
}
#else
#define snprintf_safe snprintf
#endif

std::ostringstream or std::to_string (c++11) works as an alternative, but if you require a better performance solution without extra copies or do only have C not C++ you might need to do something else:
MSVC does not support C99 and therefore has not snprintf function but only their selfmade:
_snprintf.
Differences between MSVCs _snprintf and official C99 (gcc,clang) snprintf:
Return value:
MSVC: return -1 if buffer size not enough to write everything (not including terminating null!)
GCC: return number of characters that would have been written if buffer large enough
Written bytes:
MSVC: write as much as possible, do not write NULL at end if no space left
GCC: write as much as possible, always write terminating NULL (exception: buffer_size=0)
Interesting %n subtlety:
If you use %n in your code, MSVC will leave it unitialized! if it it stops parsing because buffer size is to small, GCC will always write number of bytes which would have been written if buffer would have been large enough.
So my proposal would be to write your own wrapper function mysnprintf using vsnprintf / _vsnprintf which gives same return values and writes the same bytes on both platforms (be careful: %n is more difficult to fix).

Related

How do I avoid using a constant for filename size?

It seems like standard programming practice and the POSIX standard are at odds with each other. I'm working with a program and I noticed that I see a lot of stuff like:
char buf[NAME_MAX + 1]
And I'm also seeing that a lot of operating systems don't define NAME_MAX and say that that they technically don't have to according to POSIX because you're supposed to use pathconf to get the value it's configured to at runtime rather than hard-coding it as a constant anyway.
The problem is that the compiler won't let me use pathconf this way with arrays. Even if I try storing the result of pathconf in a const int, it still throws a fit and says it has to be a constant. So it looks like in order to actually use pathconf, I would have to avoid using an array of chars for the buffer here because that apparently isn't good enough. So I'm caught between a rock and a hard place, because the C++ standard seemingly won't allow me to do what POSIX says I must do, that is determine the size of a character buffer for a filename at runtime rather than compile time.
The only information I've been able to find on this suggests that I would need to replace the array with a vector, but it's not clear how I would do it. When I test using a simple program, I can get this to work:
std::vector<char> buf((pathconf("/", _PC_NAME_MAX) + 1));
And then I can figure out the size by calling buf.size() or something. But I'm not sure if this is the right approach at all. Does anyone have any experience with trying to get a program to stop depending on constants like NAME_MAX or MAXNAMLEN being defined in the system headers and getting the implementation to use pathconf at runtime instead?
Halfway measures do tend to result in conflicts of some sort.
const usigned NAME_MAX = /* get the value at runtime */;
char buf[NAME_MAX + 1];
The second line declares a C-style array (presumably) intended to hold a C-style string. In C, this is fine. In C++, there is an issue because the value of NAME_MAX is not known at compile time. That's why I called this a halfway measure—there is a mix of C-style code and C++ compiling. (Some compilers will allow this in C++. Apparently yours does not.)
The C++ approach would use C++-style strings, as in:
std::string buf;
That's it. The size does not need to be specified since memory will be allocated as needed, provided you avoid C-style interfaces. Use streaming (>>) when reasonable. If the buffer is being filled by user or file input, this should be all you need.
If you need to use C-style strings (perhaps this buffer is being filled by a system call written for C?), there are a few options for allocating the needed space. The simplest is probably a vector, much like you were thinking.
std::vector<char> buf{NAME_MAX + 1};
system_call(buf.data()); // Send a char* to the system call.
Alternatively, you could use a C++-style string, which could make manipulating the data more convenient.
std::string buf{NAME_MAX + 1, '\0'};
system_call(buf.data()); // Send a char* to the system call.
There is also a smart pointer option, but the vector approach might play nicer with existing code written for a C-style array.

Safe counterparts of itoa()?

I am converting some old c program to a more secure version. The following functions are used heavily, could anyone tell me their secure counterparts? Either windows functions or C runtime library functions. Thanks.
itoa()
getchar()
strcat()
memset()
itoa() is safe as long as the destination buffer is big enough to receive the largest possible representation (i.e. of INT_MIN with trailing NUL). So, you can simply check the buffer size. Still, it's not a very good function to use because if you change your data type to a larger integral type, you need to change to atol, atoll, atoq etc.. If you want a dynamic buffer that handles whatever type you throw at it with less maintenance issues, consider an std::ostringstream (from the <sstream> header).
getchar() has no "secure counterpart" - it's not insecure to begin with and has no buffer overrun potential.
Re memset(): it's dangerous in that it accepts the programmers judgement that memory should be overwritten without any confirmation of the content/address/length, but when used properly it leaves no issue, and sometimes it's the best tool for the job even in modern C++ programming. To check security issues with this, you need to inspect the code and ensure it's aimed at a suitable buffer or object to be 0ed, and that the length is computed properly (hint: use sizeof where possible).
strcat() can be dangerous if the strings being concatenated aren't known to fit into the destination buffer. For example: char buf[16]; strcpy(buf, "one,"); strcat(buf, "two"); is all totally safe (but fragile, as further operations or changing either string might require more than 16 chars and the compiler won't warn you), whereas strcat(buf, argv[0]) is not. The best replacement tends to be a std::ostringstream, although that can require significant reworking of the code. You may get away using strncat(), or even - if you have it - asprintf("%s%s", first, second), which will allocate the required amount of memory on the heap (do remember to free() it). You could also consider std::string and use operator+ to concatenate strings.
None of these functions are "insecure" provided you understand the behaviour and limitations. itoa is not standard C and should be replaced with sprintf("%d",...) if that's a concern to you.
The others are all fine to the experienced practitioner. If you have specific cases which you think may be unsafe, you should post them.
I'd change itoa(), because it's not standard, with sprintf or, better, snprintf if your goal is code security. I'd also change strcat() with strncat() but, since you specified C++ language too, a really better idea would be to use std::string class.
As for the other two functions, I can't see how you could make the code more secure without seeing your code.

How do I safely format floats/doubles with C's sprintf()?

I'm porting one of my C++ libraries to a somewhat wonky compiler -- it doesn't support stringstreams, or C99 features like snprintf(). I need to format int, float, etc values as char*, and the only options available seem to be 1) use sprintf() 2) hand-roll formatting procedures.
Given this, how do I determine (at either compile- or run-time) how many bytes are required for a formatted floating-point value? My library might be used for fuzz-testing, so it needs to handle even unusual or extreme values.
Alternatively, is there a small (100-200 lines preferred), portable implementation of snprintf() available that I could simply bundle with my library?
Ideally, I would end up with either normal snprintf()-based code, or something like this:
static const size_t FLOAT_BUFFER_SIZE = /* calculate max buffer somehow */;
char *fmt_double(double x)
{
char *buf = new char[FLOAT_BUFFER_SIZE + 1];
sprintf(buf, "%f", x);
return buf;
}
Related questions:
Maximum sprintf() buffer size for integers
Maximum sprintf() buffer size for %g-formatted floats
Does the compiler support any of ecvt, fcvt or gcvt? They are a bit freakish, and hard to use, but they have their own buffer (ecvt, fcvt) and/or you may get lucky and find the system headers have, as in VC++, a definition of the maximum number of chars gcvt will produce. And you can take it from there.
Failing that, I'd consider the following quite acceptable, along the lines of the code provided. 500 chars is pretty conservative for a double; valid values are roughly 10^-308 to 10^308, so even if the implementation is determined to be annoying by printing out all the digits there should be no overflow.
char *fmt_double(double d) {
static char buf[500];
sprintf(buf,"%f",d);
assert(buf[sizeof buf-1]==0);//if this fails, increase buffer size!
return strdup(buf);
}
This doesn't exactly provide any amazing guarantees, but it should be pretty safe(tm). I think that's as good as it gets with this sort of approach, unfortunately. But if you're in the habit of regularly running debug builds, you should at least get early warning of any problems...
I think GNU Libiberty is what you want. You can just include the implementation of snprintf.
vasprintf.c - 152 LOC.

Why is snprintf faster than ostringstream or is it?

I read somewhere that snprintf is faster than ostringstream. Has anyone has any experiences with it? If yes why is it faster.
std::ostringstream is not required to be slower, but it is generally slower when implemented. FastFormat's website has some benchmarks.
The Standard library design for streams supports much more than snprintf does. The design is meant to be extensible, and includes protected virtual methods that are called by the publicly exposed methods. This allows you to derive from one of the stream classes, with the assurance that if you overload the protected method you will get the behavior you want. I believe that a compiler could avoid the overhead of the virtual function call, but I'm not aware of any compilers that do.
Additionally, stream operations often use growable buffers internally; which implies relatively slow memory allocations.
We replaced some stringstreams in inner loops with sprintf (using statically allocated buffers), and this made a big difference, both in msvc and gcc. I imagine that the dynamic memory management of this code:
{
char buf[100];
int i = 100;
sprintf(buf, "%d", i);
// do something with buf
}
is much simpler than
{
std::stringstream ss;
int i = 100;
ss << i;
std::string s = ss.str();
// do something with s
}
but i am very happy with the overall performance of stringstreams.
Some guys would possibly tell you about that the functions can't be faster than each other, but their implementation can. That's right i think i would agree.
You are unlikely to ever notice a difference in other than benchmarks. The reason that c++ streams generally tend to be slower is that they are much more flexible. Flexibility most often comes at the cost of either time or code growth.
In this case, C++ streams are based on stream-buffers. In itself, streams are just the hull that keep formatting and error flags in place, and call the right i/o facets of the c++ standard library (for example, num_put to print numbers), that print the values, well formatted, into the underlying stream-buffer connected to the c++ stream.
All this mechanisms - the facets, and the buffers, are implemented by virtual functions. While there is indeed no mark note, those functions must be implemented to be slower than c stdio pendants that fact will make them somewhat slower than using c stdio functions normally (i benchmark'ed that some time ago with gcc/libstdc++ and in fact noticed a slowdown - but which you hardly notice in day-by-day usage).
Absolutely this is implementation-specific.
But if you really want to know, write two small programs, and compare them. You would need to include typical usage for what you have in mind, the two programs would need to generate the same string, and you would use a profiler to look at the timing information.
Then you would know.
One issue would probably be that the type safety added by ostringstream carries extra overhead. I've not done any measurements, though.
As litb said, standard streams support many things we don't always need.
Some streams implementation get rid of this never used flexibility, see FAStream for instance.
It's quite possible that because sprintf is part of the CRT that is written in assembly. The ostringstream is part of the STL, and probably a little more generically written, and has OOP code/overhead to deal with.
Yes, if you run the function below on a few million numbers with Visual C++ 5.0, the first version takes about twice as long as the second and produces the same output.
Compiling tight loops into a .exe and running the Windows timethis something.exe' or the Linuxtime something' is how I investigate most of my performance curiosities. (`timethis' is available on the web somewhere)
void Hex32Bit(unsigned int n, string &result)
{
#if 0
stringstream ss;
ss
<< hex
<< setfill('0')
<< "0x" << setw(8) << n
;
result = ss.str();
#else
const size_t len = 11;
char temp[len];
_snprintf(temp, len, "0x%08x", n);
temp[len - 1] = '\0';
result = temp;
#endif
}
One reason I know that the printf family of functions are faster than the corresponding C++ functions (cout, cin, and other streams) is that the latter do typechecking. As this usually involves some requests to overloaded operators, it can take some time.
In fact, in programming competitions it is often recommended that you use printf et al rather than cout/cin for precisely this reason.

Cross platform format string for variables of type size_t? [duplicate]

This question already has answers here:
What's the correct way to use printf to print a size_t?
(3 answers)
Closed 3 years ago.
On a cross platform c/c++ project (Win32, Linux, OSX), I need to use the *printf functions to print some variables of type size_t. In some environments size_t's are 8 bytes and on others they are 4. On glibc I have %zd, and on Win32 I can use %Id. Is there an elegant way to handle this?
The PRIuPTR macro (from <inttypes.h>) defines a decimal format for uintptr_t, which should always be large enough that you can cast a size_t to it without truncating, e.g.
fprintf(stream, "Your size_t var has value %" PRIuPTR ".", (uintptr_t) your_var);
There are really two questions here. The first question is what the correct printf specifier string for the three platforms is. Note that size_t is an unsigned type.
On Windows, use "%Iu".
On Linux and OSX, use "%zu".
The second question is how to support multiple platforms, given that things like format strings might be different on each platform. As other people have pointed out, using #ifdef gets ugly quickly.
Instead, write a separate makefile or project file for each target platform. Then refer to the specifier by some macro name in your source files, defining the macro appropriately in each makefile. In particular, both GCC and Visual Studio accept a 'D' switch to define macros on the command line.
If your build system is very complicated (multiple build options, generated sources, etc.), maintaining 3 separate makefiles might get unwieldly, and you are going to have to use some kind of advanced build system like CMake or the GNU autotools. But the basic principle is the same-- use the build system to define platform-specific macros instead of putting platform-detection logic in your source files.
The only thing I can think of, is the typical:
#ifdef __WIN32__ // or whatever
#define SSIZET_FMT "%ld"
#else
#define SSIZET_FMT "%zd"
#endif
and then taking advantage of constant folding:
fprintf(stream, "Your size_t var has value " SSIZET_FMT ".", your_var);
Dan Saks wrote an article in Embedded Systems Design which covered this matter. According to Dan, %zu is the standard way, but few compilers supported this. As an alternative, he recommended using %lu together with an explicit cast of the argument to unsigned long:
size_t n;
...
printf("%lu", (unsigned long)n);
Use boost::format. It's typesafe, so it'll print size_t correctly with %d, also you don't need to remember to put c_str() on std::strings when using it, and even if you pass a number to %s or vice versa, it'll work.
I don't know of any satisfying solution, but you might consider a specialized function to format size_t items to a string, and print the string.
(Alternatively, if you can get away with it, boost::format handles this kind of thing with ease.)
You just have to find an integer type with the largest storage class, cast the value to it, and then use the appropriate format string for the larger type. Note this solution will work for any type (ptrdiff_t, etc.), not just size_t.
What you want to use is uintmax_t and the format macro PRIuMAX. For Visual C++, you are going to need to download c99-compatible stdint.h and inttypes.h headers, because Microsoft doesn't provide them.
Also see
http://www.embedded.com/columns/technicalinsights/204700432
This article corrects the mistakes in the article quoted by Frederico.
My choice for that problem is to simply cast the size_t argument to unsigned long and use %lu everywhere - this of course only where values are not expected to exceed 2^32-1. If this is too short for you, you could always cast to unsigned long long and format it as %llu.
Either way, your strings will never be awkward.
Option 1:
Since on most (if not all?) systems, the PRIuPTR printf format string from is also long enough to hold a size_t type, I recommend using the following defines for size_t printf format strings.
However, it is important that you verify this will work for your particular architecture (compiler, hardware, etc), as the standard does not enforce this.
#include <inttypes.h>
// Printf format strings for `size_t` variable types.
#define PRIdSZT PRIdPTR
#define PRIiSZT PRIiPTR
#define PRIoSZT PRIoPTR
#define PRIuSZT PRIuPTR
#define PRIxSZT PRIxPTR
#define PRIXSZT PRIXPTR
Example usage:
size_t my_variable;
printf("%" PRIuSZT "\n", my_variable);
Option 2:
Where possible, however, just use the %zu "z" length specifier, as shown here, for size_t types:
Example usage:
size_t my_variable;
printf("%zu\n", my_variable);
On some systems, however, such as STM32 microcontrollers using gcc as the compiler, the %z length specifier isn't necessarily implemented, and doing something like printf("%zu\n", my_size_t_num); may simply end up printing out a literal "%zu" (I personally tested this and found it to be true) instead of the value of your size_t variable.
Option 3:
Where you need it to be absolutely guaranteed to work, however, or where you aren't sure about your particular architecture, just cast and print as a uint64_t and be done, as this is guaranteed to work, but requires the extra step of casting.
Example usage:
#include <stdint.h> // for uint64_t
#include <inttypes.h> // for PRIu64
size_t my_variable;
printf("%" PRIu64 "\n", (uint64_t)my_variable);
Sources Cited:
http://www.cplusplus.com/reference/cstdio/printf/
http://www.cplusplus.com/reference/cinttypes/
http://www.cplusplus.com/reference/cstdint/
size_t is an unsigned type of at least 16 bits. Widths of 32 and 64 are often seen.
printf("%zu\n", some_size_t_object); // Standard since C99
Above is the best way going forward, yet if code needs to also port to pre-C99 platforms, covert the value to some wide type. unsigned long is reasonable candidate yet may be lacking.
// OK, yet insufficient with large sizes > ULONG_MAX
printf("%lu\n", (unsigned long) some_size_t_object);
or with conditional code
#ifdef ULLONG_MAX
printf("%llu\n", (unsigned long long) some_size_t_object);
#else
printf("%lu\n", (unsigned long) some_size_t_object);
#endif
Lastly consider double. It is a bit inefficient yet should handle all ancient and new platforms until about the years 2030-2040 considering Moore's law when double may lack a precise result.
printf("%.0f\n", (double) some_size_t_object);