Why is printf(inputString) a security hole?

Why is printf(inputString) a security hole? - c++

I was reading an answer on Quora where I encountered that something as simple as:
char* inputString;
printf(inputString);
is a security hole.
I assume that the inputString is not simply uninitialized, but initialized with some external input between the two statements.
How exactly is this a security hole?
The original answer on Quora was here:
If C and C++ give the best performance, why do we still code in other languages?
but it provides no additional context for this claim.

I assume that the input string is a string you got from the user, and not just an uninitialized value.
The problem is that the user can
crash the program: printf ("%s%s%s%s%s%s%s%s%s%s%s%s")
view the stack: printf ("%08x %08x %08x %08x %08x\n");
view memory on any location,
or even write an integer to nearly any location in the process memory.
This leads to an attacker being able to:
Overwrite important program flags that control access privileges
Overwrite return addresses on the stack, function pointers, etc
It is all explained quite well here.

It's not just a security problem, but it won't work at all, because the pointer is not initialized. In this context, making the program crash = not running anymore could be a (security) problem, depending what the program does and in what context it runs.
I assume you mean you have a proper string. In this case, if the string is provided by some external input (user etc.), there can be (unexpected) placeholders like %s etc. while the rest of the printf expects eg. a %d. For this example (%s instead of %d), instead of printing an integer number, it will start printing all memory content until some 0 byte then, possibly giving out some secret information stored after the int bytes.
Something similar, ie. giving out too much bytes because of wrong unchecked user input, happened eg. in the known "Heartbleed" bug not too long ago, which was/is a pretty big global problem. ... The first printf parameter should be fixed, not coming from any variable.
Other placeholder combinations are possible too, leading to a wide range of possible effects (including generation of wrong floating point signals in the CPU, which could lead to more serious problems depending on the architecture, etc.etc.)

char* inputSting;
printf(inputSting);
Printing out an uninitialized string is undefined behavior. Effects from undefined behavior can range from printing garbage values to segfaults and other nastyness. Such unpredictable patterns can be exploited and thus compromise security.
But more importantly, nobody would write this as it doesn't do anything meaningful besides risk a segfault.

Related

Is it legal to read into an array that doesn't have space for the null terminator?

Every C string is terminated with the null character when it is stored in a C-string variable, and this always consumes one array position.
So why is this legal:
char name[3];
std::cin >> name;
when a 3 letter word "cow" is input to std::cin?
Why is it allowing this?

The compiler cannot know what input you will be giving at runtime, so it cannot say that you will be doing something illegal. If the input fits in the array, then it's legal.
Prior C++20:
The behaviour of the program is undefined. Never do this.
Since C++20:
The input will be truncated to fit into the array, so the array will contain the string "co".

It is allowed because the memory management is left to the programmer.
In this case the error occurs at run-time and it's impossible for the compiler to spot it at compile time.
What you are getting here is an undefined behavior, where the program can crash, can go on smoothly, or present some odd behavior in the future.
Most likely here you are overflowing the allocated memory by one byte which will be stored in the first byte of the subsequent declared variable, or in other addresses which may or may not be already filled with relevant information.
You eventually may experience an abort of the execution if - for example - some read-only memory area is overwritten, or if you point your Instruction Pointer to some invalid area (again, by overwriting its value in memory)

Depends what you mean by "legal". In context of your question, there is no precise meaning.
According to the standard (at least before C++20) your code (if supplied with that input) has undefined behaviour. When behaviour is undefined, any observable outcome is permitted, and no diagnostics are required.
In particular, there is no requirement that an implementation (the compiler, the host system, or anything else) diagnose a problem (e.g. issue an error message), take preventative action (e.g. electrocute the programmer for entering "cow" before hitting the enter key), or take recovery action (e.g. truncate the input to "co").
The reason the standard allows such things is a combination of;
allowing implementation freedoms. For example, the standard permits but does not require your host system to electrocute the user for entering bad input to your program - but doesn't prevent an evil-minded system developer providing hardware and driver support to do exactly that;
technical infeasibility or technical difficulty. In a simple case like your code might be easy to detect. But, in more complicated programs (e.g. your code is buried away among other code) it might be impossible to identify a problem. Or, it might be possible, but take much longer than a programmer is willing/able to spend waiting.
In short, the C++ standard does not coddle the programmer. The programmer - not the standard, not the compiler - is responsible for avoiding undefined behaviour, and responsible if a program with undefined behaviour does undesirable things.

Using garbage value(not initialized value) as id

Before ask I'm not the native speaker pardon for my poor english
Reading some book about c++ and the book mentioned variables should be initialize after declared. Otherwise variable having a garbage value and this may cause some problem. This garbage values are random and not predictable, what if using this garbage value as a ID?
I tried to find relational topic an google and couldn't find any result
Please leave a comment of idea of references

When you initialize a non-static(local) variable and use/read/print it before assigning a variable to it, the action is undefined behavior.
When you encounter undefined behavior:
You cannot assume that there's any value in that memory space to begin with
You cannot assume that the value won't change throughout the code execution, therefore, it's not ideal to use it as an ID
You cannot even assume that the program will run, or it will crash/freeze/etc... as everything action from that point on is undefined
Theoretically, an undefined behavior permits anything to happen (obviously in reality there's many restrictions to try and keep it from doing everything and probably sending everyone your pictures, corrupt all your files and frying your computer afterward).
For example, with this piece of code:
int num;
int a = num;
int b = num;
When you print it out, a and b is not guaranteed to have identical value, or the program will run at all. It's different with each systems/compilers.
As such, one must never rely on an undefined behavior to act normally.
Quote from #AnT:
So in general, the popular answer that "it is initialized with
whatever garbage was in memory" is not even remotely correct.
Uninitialized variable's behavior is different from that of a variable
initialized with garbage.
However, never mistakes undefined for inconsistency/randomness. For a (pretty bad) real life example, a broken pipe may be unexpected behavior, but the stream of water that hits you isn't inconsistent or random (until the pipe is fixed, of course).
An example when undefined behavior isn't simply taking some garbage value and results in some bizarre stuff: https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633
Related :
Uninitialized variable behaviour in C++
What happens to a declared, uninitialized variable in C? Does it have a value?

Nothing guarantees that an uninitialized variable contains garbage. It may be garbage, but in fact, any access to such variable yields undefined behavior, which means that literally anything may happen. It may be garbage, it may be constant, it may crash, it may contain previously used data, it may work like expected, but only with some compilers, etc. One is not supposed to rely on anything that comes from such variable and build any logic based on that. And, certainly, it cannot be a source of uniqueness or randomness.

First of all you should not use garbage value as ID.
Since not detail about your question, I am Considering your id is an unique value also by using your ID you are going to retrieve some value.
if that is the case then,
you cannot because on each and every run your garbage value may change.
And Garbage value in the sense it may be a numerical alphanumerical even not a proper value may come.

Why am I not getting a segmentation fault with this code? (Bus error)

I had a bug in my code that went like this.
char desc[25];
char name[20];
char address[20];
sprintf (desc, "%s %s", name, address);
Ideally this should give a segfault. However, I saw this give a bus error.
Wikipedia says something to the order of 'Bus error is when the program tries to access an unaligned memory location or when you try to access a physical (not virtual) memory location that does not exist or is not allowed. '
The second part of the above statement sounds similar to a seg fault. So my question is, when do you get a SIGBUS and when a SIGSEGV?
EDIT:-
Quite a few people have mentioned the context. I'm not sure what context would be needed but this was a buffer overflow lying inside a static class function that get's called from a number of other class functions. If there's something more specific that I can give which will help, do ask.
Anyways, someone had commented that I should simply write better code. I guess the point of asking this question was "can an application developer infer anything from a SIGBUS versus a SIGSEGV?" (picked from that blog post below)

As you probably realize, the base cause is undefined behavior in your
program. In this case, it leads to an error detected by the hardware,
which is caught by the OS and mapped to a signal. The exact mapping
isn't really specified (and I've seen integral division by zero result
in a SIGFPE), but generally: SIGSEGV occurs when you access out of
bounds, SIGBUS for other accessing errors, and SIGILL for an illegal
instruction. In this case, the most likely explination is that your
bounds error has overwritten the return address on the stack. If the
return address isn't correctly aligned, you'll probably get a SIGBUS,
and if it is, you'll start executing whatever is there, which could
result in a SIGILL. (But the possibility of executing random bytes as
code is what the standards committee had in mind when they defined
“undefined behavior”. Especially on machines with no memory
protection, where you could end up jumping directly into the OS.)

A segmentation fault is never guaranteed when you're doing fishy stuff with memory. It all depends on a lot of factors (how the compiler lays out the program in memory, optimizations etc).
What may be illegal for a C++ program may not be illegal for a program in general. For instance the OS doesn't care if you step outside an array. It doesn't even know what an array is. However it does care if you touch memory that doesn't belong to you.

A segmentation fault occurs if you try to do a data access a virtual address that is not mapped to your process. On most operating systems, memory is mapped in pages of a few kilobytes; this means that you often won't get a fault if you write off the end of an array, since there is other valid data following it in the memory page.
A bus error indicates a more low-level error; a wrongly-aligned access or a missing physical address are two reasons, as you say. However, the first is not happening here, since you're dealing with bytes, which have no alignment restriction; and I think the second can only happen on data accesses when memory is completely exhausted, which probably isn't happening.
However, I think you might also get a bus error if you try to execute code from an invalid virtual address. This could well be what is happening here - by writing off the end of a local array, you will overwrite important parts of the stack frame, such as the function's return address. This will cause the function to return to an invalid address, which (I think) will give a bus error. That's my best guess at what particular flavour of undefined behaviour you are experiencing here.
In general, you can't rely on segmentation faults to catch buffer overruns; the best tool I know of is valgrind, although that will still fail to catch some kinds of overrun. The best way to avoid overruns when working with strings is to use std::string, rather than pretending that you're writing C.

In this particular case, you don't know what kind of garbage you have in the format string. That garbage could potentially result in treating the remaining arguments as those of an "aligned" data type (e.g. int or double). Treating an unaligned area as an aligned argument definitely causes SIGBUS on some systems.

Given that your string is made up of two other strings each being a max of 20 characters long, yet you are putting it into a field that is 25 characters, that is where your first issue lies. You are have a good potential to overstep your bounds.
The variable desc should be at least 41 characters long (20 + 20 + 1 [for the space you insert]).
Use valgrind or gdb to figure out why you are getting a seg fault.

char desc[25];
char name[20];
char address[20];
sprintf (desc, "%s %s", name, address);
Just by looking at this code, I can assume that name and address each can be 20 chars long. If that is so, then does it not imply that desc should be minimum 20+20+1 chars long? (1 char for the space between name and address, as specified in the sprintf).
That can be the one reason of segfault. There could be other reasons as well. For example, what if name is longer than 20 chars?
So better you use std::string:
std::string name;
std::string address;
std::string desc = name + " " + address;
char const *char_desc = desc.str(); //if at all you need this

Struggling with sprintf... something stupid?

Sorry to pester everyone, but this has been causing me some pain. Here's the code:
char buf[500];
sprintf(buf,"D:\\Important\\Calibration\\Results\\model_%i.xml",mEstimatingModelID);
mEstimatingModelID is an integer, currently holding value 0.
Simple enough, but debugging shows this is happening:
0x0795f630 "n\Results\model_0.xml"
I.e. it's missing the start of the string.
Any ideas? This is simple stuff, but I can't figure it out.
Thanks!

In an effort to make this an actual general answer: Here's a checklist for similar errors:
Never trust what you see in release mode, especially local variables that have been allocated from stack memory. Static variables that exist in heap data are about the only thing that will generally be correct but even then, don't trust it. (Which was the case for the user above)
It's been my experience that the more recent versions of VS have less reliable release mode data (probably b/c they optimize much more in release, or maybe it's 64bitness or whatever)
Always verify that you are examining the variable in the correct function. It is very easy to have a variable named "buf" in a higher function that has some uninitialized garbage in it. This would be easily confused with the same named variable in the lower subroutine/function.
It's always a good idea to double check for buffer overruns. If you ever use a %s in your sprintf, you could get a buffer overrun.
Check your types. sprintf is pretty adaptable and you can easily get a non-crashing but strange result by passing in a string pointer when an int is expected etc.

c++ what happens if you print more characters with sprintf, than the char pointer has allocated?

I assume this is a common way to use sprintf:
char pText[x];
sprintf(pText, "helloworld %d", Count );
but what exactly happens, if the char pointer has less memory allocated, than it will be print to?
i.e. what if x is smaller than the length of the second parameter of sprintf?
i am asking, since i get some strange behaviour in the code that follows the sprintf statement.

It's not possible to answer in general "exactly" what will happen. Doing this invokes what is called Undefined behavior, which basically means that anything might happen.
It's a good idea to simply avoid such cases, and use safe functions where available:
char pText[12];
snprintf(pText, sizeof pText, "helloworld %d", count);
Note how snprintf() takes an additional argument that is the buffer size, and won't write more than there is room for.

This is a common error and leads to memory after the char array being overwritten. So, for example, there could be some ints or another array in the memory after the char array and those would get overwritten with the text.
See a nice detailed description about the whole problem (buffer overflows) here. There's also a comment that some architectures provide a snprintf routine that has a fourth parameter that defines the maximum length (in your case x). If your compiler doesn't know it, you can also write it yourself to make sure you can't get such errors (or just check that you always have enough space allocated).
Note that the behaviour after such an error is undefined and can lead to very strange errors. Variables are usually aligned at memory locations divisible by 4, so you sometimes won't notice the error in most cases where you have written one or two bytes too much (i.e. forget to make place for a NUL), but get strange errors in other cases. These errors are hard to debug because other variables get changed and errors will often occur in a completely different part of the code.

This is called a buffer overrun.
sprintf will overwrite the memory that happens to follow pText address-wise. Since pText is on the stack, sprintf can overwrite local variables, function arguments and the return address, leading to all sorts of bugs. Many security vulnerabilities result from this kind of code — e.g. an attacker uses the buffer overrun to write a new return address pointing to his own code.

The behaviour in this situation is undefined. Normally, you will crash, but you might also see no ill effects, strange values appearing in unrelated variables and that kind of thing. Your code might also call into the wrong functions, format your hard-drive and kill other running programs. It is best to resolve this by allocating more memory for your buffer.

I have done this many times, you will receive memory corruption error. AFAIK, I remember i have done some thing like this:-
vector<char> vecMyObj(10);
vecMyObj.resize(10);
sprintf(&vecMyObj[0],"helloworld %d", count);
But when destructor of vector is called, my program receive memory corruption error, if size is less then 10, it will work successfully.

Can you spell Buffer Overflow ? One possible result will be stack corruption, and make your app vulnerable to Stack-based exploitation.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js