Calculating the size of an sprintf() buffer - c++

A (very long) while ago I regularly used the following code - then on MSVC 6 - to determine the memory needed to format a string for a function with variadic arguments:
void LogPrint(const char *pszFormat, ...)
{
int nBytes;
char *pszBuffer;
va_list args;
va_start(args, pszFormat);
nBytes = vsnprintf(0, 0, pszFormat, va);
va_end(args);
// error checking omitted for brevity
pszBuffer = new char[nBytes + 1];
va_start(args, pszFormat);
vsnprintf(pszBuffer, nBytes, pszFormat, va);
va_end();
// ...
}
The obvious error you're getting in a more recent version of MSVC (I'm using 2010 now) is:
warning C4996: 'vsnprintf': This function or variable may be unsafe. Consider using vsnprintf_s instead. To disable deprecation use _CRT_SECURE_NO_WARNINGS. See online help for details.
I'm a big fan of the "treat warnings as errors" option for any C(++)-compiler, and obviously my build fails. It feels like cheating to me to simply employ #pragma warning (disable:4996) and get on with it.
The suggested "safer" alternative vsnprintf_s(), however is doomed to return -1 when input conditions of its "unsafe" predecessor occur.
TL/DR: Is there a way to implement the expected behavior of vsnprintf() to return the memory needed to fulfil its task using the new, safer variants of it?
EDIT: simply defining _CRT_SECURE_NO_WARNINGS won't cut it; there's a lot of strcpy() flying around, too. The new variant of which isn't broken, so I'd like to still see these.

The function you want to look at is _vscprintf, which "returns the number of characters that would be generated if the string pointed to by the list of arguments was printed or sent to a file or buffer using the specified formatting codes". There's a widechar variant (_vscwprintf) as well.

Related

gcc-8 -Wstringop-truncation what is the good practice?

GCC 8 added a -Wstringop-truncation warning. From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82944 :
The -Wstringop-truncation warning added in GCC 8.0 via r254630 for bug 81117 is specifically intended to highlight likely unintended uses of the strncpy function that truncate the terminating NUL charcter from the source string. An example of such a misuse given in the request is the following:
char buf[2];
void test (const char* str)
{
strncpy (buf, str, strlen (str));
}
I get the same warning with this code.
strncpy(this->name, name, 32);
warning: 'char* strncpy(char*, const char*, size_t)' specified bound 32 equals destination size [-Wstringop-truncation`]
Considering that this->name is char name[32] and name is a char* with a length potentially greater than 32. I would like to copy name into this->name and truncate it if it is greater than 32. Should size_t be 31 instead of 32? I'm confused. It is not mandatory for this->name to be NUL-terminated.
This message is trying to warn you that you're doing exactly what you're doing. A lot of the time, that's not what the programmer intended. If it is what you intended (meaning, your code will correctly handle the case where the character array will not end up containing any null character), turn off the warning.
If you do not want to or cannot turn it off globally, you can turn it off locally as pointed out by #doron:
#include <string.h>
char d[32];
void f(const char *s) {
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wstringop-truncation"
strncpy(d, s, 32);
#pragma GCC diagnostic pop
}
This new GCC warning renders strncpy() mostly unusable in many projects: Code review will not accept code, that produces warnings. But if strncpy() is used only with strings short enough, so that it can write the terminating zero byte, then zeroing out the destination buffer in the beginning and then plain strcpy() would achieve the same job.
Actually, strncpy() is one of the functions, that they had better not put into the C library. There are legitimate use cases for it, sure. But library designers forgot to put fixed size string aware counterparts to strncpy() into the standard, too. The most important such functions, strnlen() and strndup(), were only included 2008 into POSIX.1, decades after strncpy() was created! And there is still no function, that copies a strncpy() generated fixed-length string into a preallocated buffer with correct C semantics, i.e. always writing the 0-termination byte. One such function could be:
// Copy string "in" with at most "insz" chars to buffer "out", which
// is "outsz" bytes long. The output is always 0-terminated. Unlike
// strncpy(), strncpy_t() does not zero fill remaining space in the
// output buffer:
char* strncpy_t(char* out, size_t outsz, const char* in, size_t insz){
assert(outsz > 0);
while(--outsz > 0 && insz > 0 && *in) { *out++ = *in++; insz--; }
*out = 0;
return out;
}
I recommend to use two length inputs for strncpy_t(), to avoid confusion: If there was only a single size argument, it would be unclear, if it is the size of the output buffer or the maximum length of the input string (which is usually one less).
There are very little justified case for using strncpy. This is a quite dangerous function. If the source string length (without the null character) is equal to the destination buffer size, then strncpy will not add the null character at the end of the destination buffer. So the destination buffer will not be null terminated.
We should write this kind of code on Linux:
lenSrc = strnlen(pSrc, destSize)
if (lenSrc < destSize)
memcpy(pDest, pSrc, lenSrc + 1);
else {
/* Handle error... */
}
In your case, if you want to truncate the source on copy, but still want a null terminated destination buffer, then you could write this kind of code:
destSize = 32
sizeCp = strnlen(pSrc, destSize - 1);
memcpy(pDest, pSrc, sizeCp);
pDest[sizeCp] = '\0';
Edit: Oh... If this not mandatory to be NULL terminated, strncpy is the right function to use. And yes you need to call it with 32 and not 31.
I think you need to ignore this warning by disabling it... Honestly I do not have a good answer for that...
Edit2: In order to mimic the strncpy function, you could write this code:
destSize = 32
sizeCp = strnlen(pSrc, destSize - 1);
memcpy(pDest, pSrc, sizeCp + 1);
TL;DR: handle the truncation case and the warning will dissappear.
This warning happened to be really useful for me, as it uncovered an issue in my code. Consider this listing:
#include <string.h>
#include <stdio.h>
int main() {
const char long_string[] = "It is a very long string";
char short_string[8];
strncpy(short_string, long_string, sizeof(short_string));
/* This line is extremely important, it handles string truncation */
short_string[7] = '\0';
printf("short_string = \"%s\"\n", short_string);
return 0;
}
demo
As the comment says short_string[7] = '\0'; is necessary here. From the strncpy man:
Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
If we remove this line, it invokes UB. For example, for me, the program starts printing:
short_string = "It is a It is a very long string"
Basically, GCC wants you to fix the UB. I added such handling to my code and the warning is gone.
The responses from others led me to just write a simple version of strncpy.
#include<string.h>
char* mystrncpy(char* dest, const char*src, size_t n) {
memset(dest, 0, n);
memcpy(dest, src, strnlen(src, n-1));
return dest;
}
It avoids the warnings and guarantees dest is null terminated. I'm using the g++ compiler and wanted to avoid pragma entries.
I found this while looking for a near-perfect solution to this problem. Since most of the answers here describing the possibility and ways about how to handle without suppressing the warning. The accepted answer suggests the use of the following wrapper which results in another set of warnings and is frustrating and not desirable.
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wstringop-truncation"
...
#pragma GCC diagnostic pop
Instead, I found this working solution, can't say if there are any pitfalls, but it does the work nicely.
_Pragma("GCC diagnostic push")
_Pragma("GCC diagnostic ignored \"-Wstringop-truncation\"")
strncpy(d, s, 32);
_Pragma("GCC diagnostic pop")
See full article here.
I found the best way to suppress the warning is to put the expression in parentheses like this gRPC patch:
(strncpy(req->initial_request.name, lb_service_name,
GRPC_GRPCLB_SERVICE_NAME_MAX_LENGTH));
The problem with #pragma diagnostics suppression solution is that the #pragma itself will cause a warning when the compiler does not recognize either the pragma or the particular warning; also it is too verbose.
What it say is that we can only use len - 1 characters because last one should be '\0', so use seems to clean the warning we only can copy len - 1 ...
by the examples:
strncpy(this->name, name, 31);
or
#include <string.h>
char d[32];
void f(const char *s) {
strncpy(d, s, 31);
}
d[31] = '\0';

Why does this code with TCHAR and variadic arguments behave this way?

I have the following helper function:
inline void DebugMessage(const TCHAR* fmtstr, ...)
{
va_list args;
va_start(args, fmtstr);
TCHAR buffer[256];
StringCbVPrintf(buffer, 256, fmtstr, args);
OutputDebugString(buffer);
va_end(args);
}
And I call it twice like so:
DebugMessage(_T("Test %d\n", 1)); // incorrectly closed _T()
DebugMessage(_T("Test %d\n"), 1); // correctly closed _T()
I get the following output:
Test 0
Test 1
The 2nd case works as expected. I am confused why the first case functions at all, rather than being an error?
_T is not a function, it's a macro that (in a Unicode build) expands to L ## x. The misplaced bracket doesn't cause a compile error, it simply changes which parts of the line gets consumed by the macro.
The macro only takes one parameter (x) and so in the first case, with the incorrect closure, the second parameter (1) is simply discarded, and the number you get in your output is simply a result of random data on the stack.
Note that by default, VS 2012 will issue a C4002 warning about this (too many actual parameters for macro) so you may want to check that you have warnings enabled properly.

How to determine if va_list is empty

I have been reading that some compilers support va_list with macros and users were able to overload the functionality with other macros in order to count the va_list.
With visual studio, is there a way to determine if the va_list is empty (aka count==0)? Basically I would like to know this condition:
extern void Foo(const char* psz, ...);
void Test()
{
Foo("My String"); // No params were passed
}
My initial thought was to do something like this:
va_list vaStart;
va_list vaEnd;
va_start(vaStart, psz);
va_end(vaEnd);
if (vaStart == vaEnd) ...
The problem is that va_end only sets the param to null.
#define _crt_va_start(ap,v) ( ap = (va_list)_ADDRESSOF(v) + _INTSIZEOF(v) )
#define _crt_va_arg(ap,t) ( *(t *)((ap += _INTSIZEOF(t)) - _INTSIZEOF(t)) )
#define _crt_va_end(ap) ( ap = (va_list)0 )
I was thinking of maybe incorporating a terminator but I would want it to be hidden from the caller so that existing code doesnt need to be changed.
There is no way to tell how many arguments are passed through ..., nor what type they are. Variadic function parameters can only be used if you have some other way (e.g. a printf-style format string) to tell the function what to expect; and even then there is no way to validate the arguments.
C++11 provides type-safe variadic templates. I don't know whether your compiler supports these, or whether they would be appropriate for your problem.
I realize this question is fairly old, but I thought it might be helpful to flesh it out a bit. As Mike Seymour answered quite correctly, there is no absolutely reliable way to determine the number of arguments in a va_list. That's why the conventional way to define a variadic function is to include a parameter that has that information, like so: void func(const char *str, int count, ...);, which defines a contract your callers are supposed to abide by.
EDIT: The standard (7.16.1.1.3) is actually silent on the value returned by va_arg(vl, type) for any call past the end of the variable argument list. The most common case for most types is typed-zero. However, CAVEAT EMPTOR - it doesn't have to be.
The value returned from va_arg(vl, type) when there are no more arguments is a typed-zero value. For numeric types, it is 0. For pointers, it is a NULL pointer. For structs, it is a struct with all fields zeroed. If you elect to copy the va_list and try to count the copy, like so:
void func(const char *str, ...) {
va_list vl;
va_list vc;
int x;
int count;
count = 0;
va_start(vl, str);
va_copy(vc, vl);
do {
x = va_arg(vc, int);
if (x == 0) break;
count++;
} while (1)
va_end(vc);
.
.
. // do something, or something else,
. // based on the number of args in the list
.
va_end(vl);
You would have to make the assumption that the caller would abide by the contract not to pass a NULL or zero value in the list. Either way, you have to understand that the caller of a variadic function is responsible for abiding by the stated contract. So write your function, publish the contract, and sleep easy.
I am aware that my answer isn't an "orthodox" answer, however due to the limitation of the va_list macro, I found the following solution to work. We can look at a va_list as an array of chars, and even better, a null terminated one, so you can try
va_list argptr;
if(strcmp(argptr,""))
{
// not empty
}
else
{
// empty
}
I tried that and it worked for me.
I used Visual Studio 2013 Ultimate. The project type is Win32 Console Application. No compilation errors.

How to create a va_list on GCC?

I'm trying to convert some code so that it compiles on gcc too (right now, it compiles only on MSVC).
The code I'm stuck at is in a pseudo-formatting function that accepts as input a format string and zero or more arguments (const char *format, ...). It will then process some of the placeholders consuming some of the arguments, and pass the rest to vsprintf along with a new va_list dynamically generated.
This is the actual code for generating the new va_list:
char *new_args = (char *) malloc(sum);
char *n = new_args;
for(int i = 0; i < nArgs; i++)
{
int j = order[i];
int len = _getlen(types[j]);
memcpy(n, args + cumulOffsets[j], len);
n += len;
}
vsprintf(buffer, sFormat.c_str(), new_args);
In my defense, I didn't and would never write this code. In fact, I think it's one of the most hackiest things I've seen in my whole life.
However, this function is very complex, very old, and very important. It's also hasn't been modified in years (well, except now) so while I'd like to rewrite it from scratch I can't justify the time it would take plus the bugs it would introduce.
So, I need a way to do this same thing on GCC.. But there a va_list is not a char * so I'm getting:
error: ISO C++ forbids casting to an array type '__va_list_tag [1]'
I'm a bit lost. Why do you need a new dynamically-generated va_list? Why not just reuse the old one?
I believe vsnprintf() uses a current va_list object (if you can call it that). So you are free to va_start(), use the arguments you want via va_arg(), then pass the remaining arguments via the va_list to vsnprintf(), and then call va_end().
Am I missing something? Why the deep copy?
And if you do need a deep copy, why not va_start() fresh, remove the arguments you want via va_arg(), and then pass the resulting va_list object to vsnprintf().
(Each call to va_arg modifies the va_list object so that the next call returns the next argument.)
Alternatively, you could just use va_copy(). (Though be sure to follow it with a corresponding va_end().)
Addendum: Also note that these va_ macros are based on C89 & C99 standards. GNU g++ will support them. Microsoft is somewhat more limited.
Following up on TonyK's comment:
What I said above works if you are pulling the first N items off the va_list. If you are pulling items out of the middle, that's harder.
There is no portable way to construct a va_list.
However, you could pull apart the format string, use it to determine the object types (double,float,int,etc), and print each one out individually with it's own format string (a subsection of the original format string). The multiple snprintf() calls will cause some overhead. But if this routine isn't called too often, it should be viable.
You could also print out subsections of the original format string with a suitably crafted va_list. In other words, the first vsnprintf() call prints elements 1..3, the second elements 5..7, the third 10..13, etc. (As vsnprintf() will ignore extra elements on the va_list beyond what it needs. You just need a series of corresponding format-string-fragments, and popping items off the va_list with va_arg() as needed for each vsnprintf() call.)
There's not enough context to figure out what you're trying to do here, but if you need to COPY a va_list, you may be able to use the C99 standard function va_copy, which gcc supports (but I have no idea if MS supports it).
There is a way to do this, it isn't pretty:
union {
char *pa;
va_list al;
} au;
....
au.pa = new_args;
vsprintf(buffer, sFormat.c_str(), au.al);
Using a union instead of a cast is ugly, but you can't cast if va_list is an array type.

C++: how to get fprintf results as a std::string w/o sprintf

I am working with an open-source UNIX tool that is implemented in C++, and I need to change some code to get it to do what I want. I would like to make the smallest possible change in hopes of getting my patch accepted upstream. Solutions that are implementable in standard C++ and do not create more external dependencies are preferred.
Here is my problem. I have a C++ class -- let's call it "A" -- that currently uses fprintf() to print its heavily formatted data structures to a file pointer. In its print function, it also recursively calls the identically defined print functions of several member classes ("B" is an example). There is another class C that has a member std::string "foo" that needs to be set to the print() results of an instance of A. Think of it as a to_str() member function for A.
In pseudocode:
class A {
public:
...
void print(FILE* f);
B b;
...
};
...
void A::print(FILE *f)
{
std::string s = "stuff";
fprintf(f, "some %s", s);
b.print(f);
}
class C {
...
std::string foo;
bool set_foo(std::str);
...
}
...
A a = new A();
C c = new C();
...
// wish i knew how to write A's to_str()
c.set_foo(a.to_str());
I should mention that C is fairly stable, but A and B (and the rest of A's dependents) are in a state of flux, so the less code changes necessary the better. The current print(FILE* F) interface also needs to be preserved. I have considered several approaches to implementing A::to_str(), each with advantages and disadvantages:
Change the calls to fprintf() to sprintf()
I wouldn't have to rewrite any format strings
print() could be reimplemented as: fprint(f, this.to_str());
But I would need to manually allocate char[]s, merge a lot of c strings , and finally convert the character array to a std::string
Try to catch the results of a.print() in a string stream
I would have to convert all of the format strings to << output format. There are hundreds of fprintf()s to convert :-{
print() would have to be rewritten because there is no standard way that I know of to create an output stream from a UNIX file handle (though this guy says it may be possible).
Use Boost's string format library
More external dependencies. Yuck.
Format's syntax is different enough from printf() to be annoying:
printf(format_str, args) -> cout << boost::format(format_str) % arg1 % arg2 % etc
Use Qt's QString::asprintf()
A different external dependency.
So, have I exhausted all possible options? If so, which do you think is my best bet? If not, what have I overlooked?
Thanks.
Here's the idiom I like for making functionality identical to 'sprintf', but returning a std::string, and immune to buffer overflow problems. This code is part of an open source project that I'm writing (BSD license), so everybody feel free to use this as you wish.
#include <string>
#include <cstdarg>
#include <vector>
#include <string>
std::string
format (const char *fmt, ...)
{
va_list ap;
va_start (ap, fmt);
std::string buf = vformat (fmt, ap);
va_end (ap);
return buf;
}
std::string
vformat (const char *fmt, va_list ap)
{
// Allocate a buffer on the stack that's big enough for us almost
// all the time.
size_t size = 1024;
char buf[size];
// Try to vsnprintf into our buffer.
va_list apcopy;
va_copy (apcopy, ap);
int needed = vsnprintf (&buf[0], size, fmt, ap);
// NB. On Windows, vsnprintf returns -1 if the string didn't fit the
// buffer. On Linux & OSX, it returns the length it would have needed.
if (needed <= size && needed >= 0) {
// It fit fine the first time, we're done.
return std::string (&buf[0]);
} else {
// vsnprintf reported that it wanted to write more characters
// than we allotted. So do a malloc of the right size and try again.
// This doesn't happen very often if we chose our initial size
// well.
std::vector <char> buf;
size = needed;
buf.resize (size);
needed = vsnprintf (&buf[0], size, fmt, apcopy);
return std::string (&buf[0]);
}
}
EDIT: when I wrote this code, I had no idea that this required C99 conformance and that Windows (as well as older glibc) had different vsnprintf behavior, in which it returns -1 for failure, rather than a definitive measure of how much space is needed. Here is my revised code, could everybody look it over and if you think it's ok, I will edit again to make that the only cost listed:
std::string
Strutil::vformat (const char *fmt, va_list ap)
{
// Allocate a buffer on the stack that's big enough for us almost
// all the time. Be prepared to allocate dynamically if it doesn't fit.
size_t size = 1024;
char stackbuf[1024];
std::vector<char> dynamicbuf;
char *buf = &stackbuf[0];
va_list ap_copy;
while (1) {
// Try to vsnprintf into our buffer.
va_copy(ap_copy, ap);
int needed = vsnprintf (buf, size, fmt, ap);
va_end(ap_copy);
// NB. C99 (which modern Linux and OS X follow) says vsnprintf
// failure returns the length it would have needed. But older
// glibc and current Windows return -1 for failure, i.e., not
// telling us how much was needed.
if (needed <= (int)size && needed >= 0) {
// It fit fine so we're done.
return std::string (buf, (size_t) needed);
}
// vsnprintf reported that it wanted to write more characters
// than we allotted. So try again using a dynamic buffer. This
// doesn't happen very often if we chose our initial size well.
size = (needed > 0) ? (needed+1) : (size*2);
dynamicbuf.resize (size);
buf = &dynamicbuf[0];
}
}
I am using #3: the boost string format library - but I have to admit that I've never had any problem with the differences in format specifications.
Works like a charm for me - and the external dependencies could be worse (a very stable library)
Edited: adding an example how to use boost::format instead of printf:
sprintf(buffer, "This is a string with some %s and %d numbers", "strings", 42);
would be something like this with the boost::format library:
string = boost::str(boost::format("This is a string with some %s and %d numbers") %"strings" %42);
Hope this helps clarify the usage of boost::format
I've used boost::format as a sprintf / printf replacement in 4 or 5 applications (writing formatted strings to files, or custom output to logfiles) and never had problems with format differences. There may be some (more or less obscure) format specifiers which are differently - but I never had a problem.
In contrast I had some format specifications I couldn't really do with streams (as much as I remember)
You can use std::string and iostreams with formatting, such as the setw() call and others in iomanip
The {fmt} library provides fmt::sprintf function that performs printf-compatible formatting (including positional arguments according to POSIX specification) and returns the result as std::string:
std::string s = fmt::sprintf("The answer is %d.", 42);
Disclaimer: I'm the author of this library.
The following might be an alternative solution:
void A::printto(ostream outputstream) {
char buffer[100];
string s = "stuff";
sprintf(buffer, "some %s", s);
outputstream << buffer << endl;
b.printto(outputstream);
}
(B::printto similar), and define
void A::print(FILE *f) {
printto(ofstream(f));
}
string A::to_str() {
ostringstream os;
printto(os);
return os.str();
}
Of course, you should really use snprintf instead of sprintf to avoid buffer overflows. You could also selectively change the more risky sprintfs to << format, to be safer and yet change as little as possible.
You should try the Loki library's SafeFormat header file (http://loki-lib.sourceforge.net/index.php?n=Idioms.Printf). It's similar to boost's string format library, but keeps the syntax of the printf(...) functions.
I hope this helps!
Is this about serialization? Or printing proper?
If the former, consider boost::serialization as well. It's all about "recursive" serialization of objects and sub-object.
Very very late to the party, but here's how I'd attack this problem.
1: Use pipe(2) to open a pipe.
2: Use fdopen(3) to convert the write fd from the pipe to a FILE *.
3: Hand that FILE * to A::print().
4: Use read(2) to pull bufferloads of data, e.g. 1K or more at a time from the read fd.
5: Append each bufferload of data to the target std::string
6: Repeat steps 4 and 5 as needed to complete the task.