How to fix fprintf vulnerability? - c++

In my code I used fprintf. I used flawfinder to check the code for vulnerabilities and I got that:
358: [4] (format) fprintf: If format strings can be influenced by
an attacker, they can be exploited. Use a constant for the format
specification.
Can someone explain to me what Use a constant for the format specification actually means? Is there any safe version of fprintf?

The problem is that fprintf determines how many arguments it should get by examining the format string. If the format string doesn't agree with the actual arguments, you have undefined behavior which can manifest as a security vulnerability.
The problem is particularly bad if the string supplied can be influenced by the user of your program, because he can then specifically design the string to make your program do bad things.
There is no safe version of fprintf in the C standard. C++ streams avoid the problem, at the cost of not having format strings and using a far more verbose syntax for specifying formatting options.

A constant string, as in a string literal.
Like in
fprintf(someFile, "%s", someStringVariable);
and not like
fprintf(someFile, someStringVariable);

It means it wants you to write:
fprintf(out, "foo %s", some_string);
instead of what you have, which I guess is something like:
const char *format = "foo %s";
/* some time later */
fprintf(out, format, some_string);
The reason is that it's worried format might come from user input or something, and a malicious user could supply a format foo %s%s%s in order to provoke undefined behavior that they may be able to exploit.
Obviously if you're choosing between n different format strings, all of which are string literals in your code and all use the same format specifiers, but you choose which one at runtime, then following this advice is a bit awkward and wouldn't make your code any safer. But you could have n functions instead of n strings, and each function calls fprintf with a different string literal.
If you're reading the format string out of a config file (which is one fairly crude way of implementing internationalization from scratch) then you're basically out of luck. The linter doesn't trust your translator to use the right format codes for the arguments supplied to the call. And arguably neither should you :-)

Related

How can I replicate compile time hex string interpretation at run time!? c++

In my code the following line gives me data that performs the task its meant for:
const char *key = "\xf1`\xf8\a\\\x9cT\x82z\x18\x5\xb9\xbc\x80\xca\x15";
The problem is that it gets converted at compile time according to rules that I don't fully understand. How does "\x" work in a String?
What I'd like to do is to get the same result but from a string exactly like that fed in at run time. I have tried a lot of things and looked for answers but none that match closely enough for me to be able to apply.
I understand that \x denotes a hex number. But I don't know in which form that gets 'baked out' by the compiler (gcc).
What does that ` translate into?
Does the "\a" do something similar to "\x"?
This is indeed provided by the compiler, but this part is not member of the standard library. That means that you are left with 3 ways:
dynamically write a C++ source file containing the string, and writing it on its standard output. Compile it and (providing popen is available) execute it from your main program and read its input. Pretty ugly isn't it...
use the source of an existing compiler, or directly its internal libraries. Clang is probably a good starting point because it has been designed to be modular. But it could require a good amount of work to find where that damned specific point is coded and how to use that...
just mimic what the compiler does, and write your own parser by hand. It is not that hard, and will learn you why tests are useful...
If it was not clear until here, I strongly urge you to use the third way ;-)
If you want to translate "escape" codes in strings that you get as input at run-time then you need to do it yourself, explicitly.
One way is to read the input into one string. Then copy the characters from that source string into a new destination string, one by one. If you see a backslash then you discard it, fetch the next character, and if it's an x you can use e.g. std::stoi to convert the next few characters into its corresponding integer value, and append that number to the destination string (either adding it with std::to_string, or using output string streams and the normal "output" operator <<).

is it good to use sscanf for parsing string

I have been using sscanf() in my parser to get some css like tokens such as color code some variations below;
#FDC69A
#ff0
orange
Example code will be;
int r g b;
cosnt char* s = "#FAFAFA";
if(sscanf(s, "#%02x%02x%02x", &r, &g, &b) == 3){
// color code ok
}
My preferred language for current project is c++, I think sscanf can be faster than regular character by character parsing and overall code will be bug free & minimal still it may have portability issues across different compilers.
A thing I noticed is, popular of open source project do not use sscanf for tokenizing input buffers instead they do it char by char, it is a bad programming practice to use sscanf in parsing that i am following?
The biggest problem with sscanf (as well as scanf and fscanf) is that numeric overflow causes undefined behavior. For example:
const char *s = "999999999999999999999999999999";
int n;
sscanf(s, "%d", &n);
The C standard says exactly nothing about how this code behaves. It might set n to some arbitrary value, it might report an error, or it might crash.
(In practice, existing implementations are likely to behave sensibly, for some value of "sensibly".)
if(sscanf(s, "#%02x%02x%02x", &r, &g, &b) == 3) is robust... nothing to worry about there.
Historically, the big concern with those functions was that someone might specify a format flag that doesn't match the argument (e.g. %d not given an int*)... many modern compilers have enough validation to avoid accidents like that.
Still, C++ has iostreams, and people tend to use those for many I/O and parsing operations as the stream destructors automatically flush and close files and release descriptors, they're type safe, extensible to user-defined types, you can generally reuse parsing/output code for any type of stream, and they're often convenient. They'd be significantly more tedious for your specific test above though.
If you've noticed lots of OSS programs scanning character by character, it may be because:
They're doing more complex parsing - where they want to branch to different parsing logic after reading individual characters, or
In your code you have a firm expectation of what to expect, so it's reasonable to do a sscanf to test that, but if you were writing say a compiler it'd be too slow to try a huge if/else list of hundreds sscanf attempts to recognise tokens.
Relevant for scanf, fscanf but not sscanf - avoid scanning too far so they can ungetc, which (from memory) is only portably guaranteed to work for 1 character.

scanf on an istream object

NOTE: I've seen the post What is the cin analougus of scanf formatted input? before asking the question and the post doesn't solve my problem here. The post seeks for C++-way to do it, but as I mentioned already, it is inconvenient to just use C++-way to do it sometimes and I have clear examples for that.
I am trying to read data from an istream object, and sometimes it is inconvenient to just use C++-style ways such as operator>>, e.g. the data are in special form 123:456 so you have to imbue to make ':' as space (which is very hacky, as opposed to %d:%d in scanf), or 00123 where you want to read as string and convert decimal instead of octal (as opposed to %d in scanf), and possibly many other cases.
The reason I chose istream as interface is because it can be derived and therefore more flexible. For example, we can create in-memory streams, or some customized streams that generated on the fly, etc. C-style FILE*, on the other hand, is very limited, at least in a standard-compliant way, on creating customized streams.
So my questions is, is there a way to do scanf-like data extraction on istream object? I think fscanf internally read character by character from FILE* using fgetc, while istream also provides such interface. So it is possible by just copying and pasting the code of fscanf and replace the FILE* with the istream object, but that's very hacky. Is there a smarter and cleaner way, or is there some existing work on this?
Thanks.
You should never, under any circumstances, use scanf or its relatives for anything, for three reasons:
Many format strings, including for instance all the simple uses of %s, are just as dangerous as gets.
It is almost impossible to recover from malformed input, because scanf does not tell you how far in characters into the input it got when it hit something unexpected.
Numeric overflow triggers undefined behavior: yes, that means scanf is allowed to crash the entire program if a numeric field in the input has too many digits.
Prior to C++11, the C++ specification defined istream formatted input of numbers in terms of scanf, which means that last objection is very likely to apply to them as well! (In C++11 the specification is changed to use strto* instead and to do something predictable if that detects overflow.)
What you should do instead is: read entire lines of input into std::string objects with getline, hand-code logic to split them up into fields (I don't remember off the top of my head what the C++-string equivalent of strsep is, but I'm sure it exists) and then convert numeric strings to machine numbers with the strtol/strtod family of functions.
I cannot emphasize this enough: THE ONLY 100% RELIABLE WAY TO CONVERT STRINGS TO NUMBERS IN C OR C++, unless you are lucky enough to have a C++ runtime that is already C++11-conformant in this regard, IS WITH THE strto* FUNCTIONS, and you must use them correctly:
errno = 0;
result = strtoX(s, &ends, 10); // omit 10 for floats
if (s == ends || *ends || errno)
parse_error();
(The OpenBSD manpages, linked above, explain why you have to do this fairly convoluted thing.)
(If you're clever, you can use ends and some manual logic to skip that colon, instead of strsep.)
I do not recommend you to mix C++ input output and C input output. No that they are really incompatible but they could just plain interoperate wrong.
For example Oracle docs recommend not to mix it http://www.oracle.com/technetwork/articles/servers-storage-dev/mixingcandcpluspluscode-305840.html
But no one stops you from reading data into the buffer and parsing it with standard c functions like sscanf.
...
string curString;
int a, b;
...
std::getline(inputStream, curString);
int sscanfResult == sscanf(curString.cstr(), "%d:%d", &a, &b);
if (2 != sscanfResult)
throw "error";
...
But it won't help in some situations when your stream is just one long contiguous sequence of symbols(like some string turned into memory stream).
Making your own fscanf from scratch or porting(?) the original CRT function actually isn't the worst possible idea. Just make sure you have tested it thoroughly(low level custom char manipulation was always a source of pain in C).
I've never really tried the boost\spirit and such parsing infrastructure could really be an overkill for your project. But boost libraries are usually well tested and designed. You could at least try to use it.
Based on #tmyklebu's comment, I implemented streamScanf which wraps istream as FILE* via fopencookie: https://github.com/likan999/codejam/blob/master/Common/StreamScanf.cpp

C++: Format not a string literal and no format arguments [duplicate]

This question already has answers here:
warning: format not a string literal and no format arguments
(3 answers)
Closed 8 years ago.
I've been trying to print a string literal but seems like I'm doing it wrong, since I'm getting a warning on compilation. It's probably due to wrong formatting or my misunderstanding of c_str() function, which I assume should return a string.
parser.y: In function ‘void setVal(int)’:
parser.y:617:41: warning: format not a string literal and no format arguments [-Wformat-security]
Line 617:
sprintf(temp, constStack.top().c_str());
Having those declarations
#include <stack>
const int LENGTH = 15;
char *temp = new char[LENGTH];
stack<string> constStack;
How can I provide a proper formating to string?
Simple - provide a format string:
sprintf(temp, "%s", constStack.top().c_str());
But much, much better:
string temp = constStack.top();
You are telling me in your comment that the problem is not so much the warning as the fact that your code doesn't do what you expect it to.
The solution to this and other, similar problems is to get rid of the strong C influence in your C++ code. Specifically, don't use raw dynamically allocated char arrays or sprintf. Use std::string instead.
In this case, you are using sprintf very incorrectly. Have you ever seen its signature? It goes like this:
sprintf(char *str, char const *format, ...)
str is the output of the operation. format describes what the output should be. The rest are the format arguments, which must by pure convention match what's described in format.
Now this "rest", written as ..., means that you can pass any number of arguments, even zero. And this is why your code even compiles (delivering a nice example for why ... is a dangerous feature, by the way).
In your code, the output string is, possibly incorrectly, your temp string. And the format to describe the output is, almost certainly incorrectly, what happens to sit on top of your stack.
Is this just about assigning one string to another, using sprintf simply because it more or less can do that as a very special case of what its feature set offers? There's no need for such hacks, as C++ has string assignment out of the box with std::string:
std::string temp = constStack.top();
Notice that this also eliminates the need to know the length of the string in advance.
If, for some reason, you really need formatting (but your question doesn't really show any need for it), then learn more about string streams as an alternative solution to format strings.
As the warning indicates it is issued as a result of the -Wformat-security option; you could simply disable the warning by removing the option; but it would be perhaps unwise.
The security issue is perhaps theoretical unless your code is to be widely distributed. Of perhaps more immediate concern is the possibility of your code crashing or behaving abnormally.
The problem is that the string is variable, and may at runtime contain formatting characters that cause it to attempt to read non-existent arguments. If for example the string is received from user input and the user entered "%s" it would attempt to read a string from some somewhere on the stack. That would at best place junk in temp, but worse if the memory read happened not to contain a nul character in the first 15 bytes, it would overrun temp, and corrupt the heap (in this case). Heap corruptions are probably worse than stack corruptions - the latent bug can remain unnoticed in your code for a long time only to start crashing after some unrelated change; and if it does crash, it is unlikely to be in any proximity to the cause.

printf vs. std::cout [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Should I use printf in my C++ code?
If I just want to print a string on screen, I can do that using those two ways:
printf("abc");
std::cout << "abc" << std::endl;
The case is, and in the examples shown above, is there an advantage of using printf over std::cout, or, vice versa?
While cout is the proper C++ way, I believe that some people and companies (including Google) continue to use printf in C++ code because it is much easier to do formatted output with printf than with cout.
Here's an interesting example that I found here.
Compare:
printf( "%-20s %-20s %5s\n" , "Name" , "Surname" , "Id" );
and
cout << setw( -20 ) << "Name" << setw( 20 ) << "Surname" << setw( 5 ) << "Id" << endl;
printf and its associated friends are C functions. They work in C++, but do not have the type safety of C++ std::ostreams. Problems can arise in programs that use printf functions to format output based on user input (or even input from a file). For example:
int main()
{
char[] a = {'1', '2', '3', '4'}; // a string that isn't 0-terminated
int i = 50;
printf("%s", a); // will continue printing characters until a 0 is found in memory
printf("%s", i); // will attempt to print a string, but this is actually an integer
}
C++ has much stronger type safety (and a std::string class) to help prevent problems like these.
I struggle with this very question myself. printf is in general easier to use for formatted printing, but the iostreams facility in C++ has the big advantage that you can create custom formatters for objects. I end up using both of them in my code as necessary.
The problem with using both and intermixing them is that the output buffers used by printf and cout are not the same, so unless you run unbuffered or explicitly flush output you can end up with corrupted output.
My main objection to C++ is that there is no fast output formatting facility similar to printf, so there is no way to easily control output for integer, hex, and floating point formatting.
Java had this same problem; the language ended up getting printf.
Wikipedia has a good discussion of this issue at http://en.wikipedia.org/wiki/Printf#C.2B.2B_alternatives_to_sprintf_for_numeric_conversion.
Actually for your particular example, you should have asked which is preferable, puts or cout. printf prints formatted text but you are just outputting plain text to the console.
For general use, streams (iostream, of which cout is a part) are more extensible (you can print your own types with them), and are more generic in that you can generate functions to print to any type of stream, not just the console (or redirected output). You can create generic stream behaviour with printf too using fprintf which take a FILE* as a FILE* is often not a real file, but this is more tricky.
Streams are "typesafe" in that you overload with the type you are printing. printf is not typesafe with its use of ellipses so you could get undefined results if you put the wrong parameter types in that do not match the format string, but the compiler will not complain. You may even get a seg-fault / undefined behaviour (but you could with cout if used incorrectly) if you miss a parameter or pass in a bad one (eg a number for %s and it treats it as a pointer anyway).
printf does have some advantages though: you can template a format string then reuse that format string for different data, even if that data is not in a struct, and using formatting manipulations for one variable does not "stick" that format for further use because you specify the format for each variable. printf is also known to be threadsafe whereas cout actually is not.
boost has combined the advantages of each with their boost::format library.
The printf has been borrowed from C and has some limitations. The most common mentioned limitation of printf is type safety, as it relies on the programmer to correctly match the format string with the arguments. The second limitation that comes again from the varargs environment is that you cannot extend the behavior with user defined types. The printf knows how to print a set of types, and that's all that you will get out of it. Still, it for the few things that it can be used for, it is faster and simpler to format strings with printf than with c++ streams.
While most modern compilers, are able to address the type safety limitation and at least provide warnings (the compiler can parse the format string and check the arguments provided in the call), the second limitation cannot be overcome. Even in the first case, there are things that the compiler cannot really help with, as checking for null termination --but then again, the same problem goes with std::cout if you use it to print the same array.
On the other end, streams (including std::cout) can be extended to handle user defined types by means of overloaded std::ostream& operator<<( std::ostream&, type const & ) for any given user defined type type. They are type safe by themselves --if you pass in a type that has no overloaded operator<< the compiler will complain. They are, on the other hand, more cumbersome to produce formatted output.
So what should you use? In general I prefer using streams, as overloading operator<< for my own types is simple and they can be used uniformly with all types.
Those two examples do different things. The latter will add a newline character and flush output (result of std::endl). std::cout is also slower. Other than that, printf and std::cout achieve the same thing and you can choose whichever you prefer. As a matter of preference, I'd use std::cout in C++ code. It's more readable and safer.
See this article if you need to format output using std::cout.
In general, you should prefer cout because it's much type-safer and more generic. printf isn't type-safe, nor is it generic at all. The only reason you might favour printf is speed- from memory, printf is many times faster than cout.