My boss told me to look at the following code and tell him what the potential security vulnerabilities were. I'm not very good at this kind of thing, since I don't think in the way of trying to hack code. All I see is that nothing is declared private, but other than that I just don't know.
#define NAME_SIZE (unsigned char) 255
// user input should contain the user’s name (first name space
// middle initial space last name and a null
// character), and was entered directly by the user.
// Returns the first character in the user input, or -1 if the method failed.
char poor_method(char* user_input, char* first, char *middle, char* last)
{
char*buffer;
char length;
// find first name
buffer = strtok(user_input, " ");
if(buffer==0)
{
return -1;
}
length = strlen(buffer);
if(length <= NAME_SIZE)
{
strcpy(first, buffer);
}
// find middle name
buffer = strtok(NULL, " ");
if(buffer==0)
{
return-1;
}
if(middle)
*middle = buffer[0];
// find last name
buffer = strtok(NULL, "\0");
length = strlen(buffer);
if(length <= NAME_SIZE)
{
strcpy(last, buffer);
}
// Check to make sure that all of the user input was used
buffer = strtok(NULL, "\0");
if(buffer != NULL)
{
return-1;
}
return first[0];
}
What security vulnerabilities are there?
Get good at writing secure code
You most likely don't want systems that you are responsible for finding their way onto bugtraq or cve. If you don't understand it, be honest with your boss. Tell him you don't understand and you want to work on it. Pick up Writing Secure Code. Read it, learn it, love it. Asking this question on SO and giving your boss the answer definitely doesn't help you in the long run.
Then look at the sample code again :)
What I saw (by no means a complete list):
There's no guarantees you're going to get a char pointer which points to a null-terminating string (unless you're allowed to make that assumption, not really a safe one to make).
strtok and strcpy are the C way of doing things and come with the fun stuff of programming C code. If you must use them, so be it (just make sure you can guarantee you're inputs to these functions will indeed be valid). Otherwise, try switching your code to use std::string and the "C++ way" (as Cat Plus Plus put it)
I'm assuming this is a typo:
charpoor_method(
You're missing a space between char and poor_method(
You're not checking if first or last are indeed valid pointers (unfortunately, the best you can do is to check them against NULL).
There's no guarantee that the buffers first or last can indeed hold whatever you're copying to them.
Another typo:
returnfirst[0];
missing space between return and first[0]
Learning to write secure code is something that's very important to do. Follow Brecht's advice and get good at it.
Ok strtok assumes user_input is NULL terminated, this might not be true.
charlength = strlen(buffer);
if(length <= NAME_SIZE)
{
strcpy(first, buffer);
}
charlenght here is undeclared, so is length, they should be declared as unsigned int.
strlen wont count the '\0' as a part of the length, so later strcpy will copy the '\0' to whatever is after First if the len of buffer is 255 + 1('\0')
Also is unknown if char *first size is, it should be NAME_SIZE but the comparisson should be
length <= NAME_SIZE - 1
or allocate char *first to NAME_SIZE + 1
I'd probably rewrite the whole thing, is quite ugly.
Rather than using strcpy(), use strncpy() with a specific length parameter, as that function, like strtok(), assumes a NULL-terminated buffer for the source, and that may not be the case, giving you a buffer overflow for the data copied into the buffer pointed to by either first or last. Additionally, you have no idea how long the buffers are that have been allocated for first and last ... Don't assume that the user of your function has properly allocated enough memory to copy into unless they've passed you a parameter telling you there are enough memory slots in the buffers. Otherwise again, you could (and most likely will) end-up with buffer overflows.
Also you may want to use the restrict keyword if you're using C99 in order to prevent the caller of your function from aliasing the same memory location for buffer, first, and last.
Related
I'm fairly new to c and I'm reading a book regarding Software Vulnerabilities and I came across this buffer overflow sample, it mentions that this can cause a buffer overflow. I am trying to determine how this is the case.
int handle_query_string(char *query_string)
{
struct keyval *qstring_values, *ent;
char buf[1024];
if(!query_string) {
return 0;
}
qstring_values = split_keyvalue_pairs(query_string);
if((ent = find_entry(qstring_values, "mode")) != NULL) {
sprintf(buf, "MODE=%s", ent->value);
putenv(buf);
}
}
I am paying close attention to this block of code because this appears to be where the buffer overflow is caused.
if((ent = find_entry(qstring_values, "mode")) != NULL)
{
sprintf(buf, "MODE=%s", ent->value);
putenv(buf);
}
I think here is it, because your buf is only 1024 and because ent->value can have more than 1024, then this may overflow.
sprintf(buf, "MODE=%s", ent->value);
But depends of implementations of split_keyvalue_pairs(query_string). If this function already checks the value and threat it (which I doubt).
klutt provided a good fix for the problem in a previous answer, so I'll try to go a bit more specific and in-depth on the exact nature of the overflow in your code.
char buf[1024];
This line allocates 1024 bytes on the stack, addressed by the pointer named buf. The big problem here is that it is on the stack. If you dynamically allocate using malloc (or my favorite: calloc), it will be on the heap. The location doesn't necessarily prevent or fix an overflow. But it can change the effect. Right above (give or take some bytes) this space on the stack would be the return address from the function, and an overflow can change that causing the program to redirect when it returns.
sprintf(buf, "MODE=%s", ent->value);
This line is what actually performs the overflow. sprintf = "string print format." This means that the destination is a string (char *), and you are printing a formatted string. It doesn't care about the length, it will just take the starting memory address of the destination string, and keep writing until it has finished. If there's more than 1024 characters to be written (in this case), then it will go past the end of your buffer and overflow into other parts of memory. The solution is to use the function snprint instead. The "n" tells you that it will limit the amount to be written to the destination, and avoid an overflow.
The ultimate thing to understand is that a "buffer" does not actually exist. It's simply not a thing. It is a concept we use to order the area in memory, but the computer has no idea what a buffer is, where it starts, or where it ends. So in writing, the computer doesn't really care if it is inside or outside of the buffer, and doesn't know where to stop writing. So, we need to tell it very explicitly how far it is allowed to write, or it will just keep writing.
A very big thing here is that you passed a pointer to a local variable to putenv. The buffer will cease to exist when handle_query_string returns. After that it will contain garbage variables. Note that what putenv does require that the string passed to it remains unchanged for the rest of the program. From the documentation for putenv (emphasis mine):
int putenv(char *string);
The putenv() function adds or changes the value of environment variables. The argument string is of the form name=value. If name does not already exist in the environment, then string is added to the environment. If name does exist, then the value of name in the environment is changed to value. The string pointed to by string becomes part of the environment, so altering the string changes the environment.
This can be corrected by using dynamic allocation. char *buf = malloc(1024) instead of char buf[1024]
Another thing is that sprintf(buf, "MODE=%s", ent->value); might overflow. That would happen if the string ent->value is too long. A solution there is to use snprintf instead.
snprintf(buf, sizeof buf, "MODE=%s", ent->value);
This prevents overflow, but it might still cause problems, because if ent->value is too big to fit in buf, then buf will for obvious reasons not contain the full string.
Here is a way to rectify both issues:
int handle_query_string(char *query_string)
{
struct keyval *qstring_values, *ent;
char *buf = NULL;
if(!query_string)
return 0;
qstring_values = split_keyvalue_pairs(query_string);
if((ent = find_entry(qstring_values, "mode")) != NULL)
{
// Make sure that the buffer is big enough instead of using
// a fixed size. The +5 on size is for "MODE=" and +1 is
// for the string terminator
const char[] format_string = "MODE=%s";
const size_t size = strlen(ent->value) + 5 + 1;
buf = malloc(size);
// Always check malloc for failure or chase nasty bugs
if(!buf) exit(EXIT_FAILURE);
sprintf(buf, format_string, ent->value);
putenv(buf);
}
}
Since we're using malloc the allocation will remain after the function exits. And for the same reason, we make sure that the buffer is big enough beforehand, and thus, using snprintf instead of sprintf is not necessary.
Theoretically, this has a memory leak unless you use free on all strings you have allocated, but in reality, not freeing before exiting main is very rarely a problem. Might be good to know though.
It can also be good to know that even though this code now is fairly protected, it's still not thread safe. The content of query_string and thus also ent->value may be altered. Your code does not show it, but it seems highly likely that find_entry returns a pointer that points somewhere in query_string. This can of course also be solved, but it can get complicated.
I'm trying to get the length of a character array in a second function. I've looked at a few questions on here (1 2) but they don't answer my particular question (although I'm sure something does, I just can't find it). My code is below, but I get the error "invalid conversion from 'char' to 'const char*'". I don't know how to convert my array to what is needed.
#include <cstring>
#include <iostream>
int ValidInput(char, char);
int main() {
char user_input; // user input character
char character_array[26];
int valid_guess;
valid_guess = ValidGuess(user_input, character_array);
// another function to do stuff with valid_guess output
return 0;
}
int ValidGuess (char user_guess, char previous_guesses) {
for (int index = 0; index < strlen(previous_guesses); index++) {
if (user_guess == previous_guesses[index]) {
return 0; // invalid guess
}
}
return 1; // valid guess, reaches this if for loop is complete
}
Based on what I've done so far, I feel like I'm going to have a problem with previous_guesses[index] as well.
char user_input;
defines a single character
char character_array[26];
defines an array of 26 characters.
valid_guess = ValidGuess(user_input, character_array);
calls the function
int ValidGuess (char user_guess, char previous_guesses)
where char user_guess accepts a single character, lining up correctly with the user_input argument, and char previous_guesses accepts a single character, not the 26 characters of character_array. previous_guesses needs a different type to accommodate character_array. This be the cause of the reported error.
Where this gets tricky is character_array will decay to a pointer, so
int ValidGuess (char user_guess, char previous_guesses)
could be changed to
int ValidGuess (char user_guess, char * previous_guesses)
or
int ValidGuess (char user_guess, char previous_guesses[])
both ultimately mean the same thing.
Now for where things get REALLY tricky. When an array decays to a pointer it loses how big it is. The asker has gotten around this problem, kudos, with strlen which computes the length, but this needs a bit of extra help. strlen zips through an array, counting until it finds a null terminator, and there are no signs of character_array being null terminated. This is bad. Without knowing where to stop strlen will probably keep going1. A quick solution to this is go back up to the definition of character_array and change it to
char character_array[26] = {};
to force all of the slots in the array to 0, which just happens to be the null character.
That gets the program back on its feet, but it could be better. Every call to strlen may recount (compilers are smart and could compute once per loop and store the value if it can prove the contents won't change) the characters in the string, but this is still at least one scan through every entry in character_array to see if it's null when what you really want to do is scan for user_input. Basically the program looks at every item in the array twice.
Instead, look for both the null terminator and user_input in the same loop.
int index = 0;
while (previous_guesses[index] != '\0' ) {
if (user_guess == previous_guesses[index]) {
return 0; // prefer returning false here. The intent is clearer
}
index++;
}
You can also wow your friends by using pointers and eliminating the need for the index variable.
while (*previous_guesses != '\0' ) {
if (user_guess == *previous_guesses) {
return false;
}
previous_guesses++;
}
The compiler knows and uses this trick too, so use the one that's easier for you to understand.
For 26 entries it probably doesn't matter, but if you really want to get fancy, or have a lot more than 26 possibilities, use a std::set or a std::unordered_set. They allow only one of an item and have much faster look-up than scanning a list one by one, so long as the list is large enough to get over the added complexity of a set and take advantage of its smarter logic. ValidGuess is replaced with something like
if (used.find(user_input) != used.end())
Side note: Don't forget to make the user read a value into user_input before the program uses it. I've also left out how to store the previous inputs because the question does as well.
1 I say probably because the Standard doesn't say what to do. This is called Undefined Behaviour. C++ is littered with the stuff. Undefined Behaviour can do anything -- work, not work, visibly not work, look like it works until it doesn't, melt your computer, anything -- but what it usually does is the easiest and fastest thing. In this case that's just keep going until the program crashes or finds a null.
I have found myself writing code which looks like this
// Treat the following as pseudocode - just an example
iofile.seekg(0, std::ios::end); // iofile is a file opened for read/write
uint64_t f_len = iofile.tellg();
if(f_len >= some_min_length)
{
// Focus on the following code here
char *buf = new char[7];
char buf2[]{"MYFILET"}; // just some random string
// if we see this it's a good indication
// the rest of the file will be in the
// expected format (unlikely to see this
// sequence in a "random file", but don't
// worry too much about this)
iofile.read(buf, 7);
if(memcmp(buf, buf2, 7) == 0) // I am confident this works
{
// carry on processing file ...
// ...
// ...
}
}
else
cout << "invalid file format" << endl;
This code is probably an okay sketch of what we might want to do when opening a file, which has some specified format (which I've dictated). We do some initial check to make sure the string "MYFILET" is at the start of the file - because I've decided all my files for the job I'm doing are going to start with this sequence of characters.
I think this code would be better if we didn't have to play around with "c-style" character arrays, but used strings everywhere instead. This would be advantageous because we could do things like if(buf == buf2) if buf and buf2 where std::strings.
A possible alternative could be,
// Focus on the following code here
std::string buf;
std::string buf2("MYFILET"); // very nice
buf.resize(7); // okay, but not great
iofile.read(buf.data(), 7); // pretty awful - error prone if wrong length argument given
// also we have to resize buf to 7 in the previous step
// lots of potential for mistakes here,
// and the length was used twice which is never good
if(buf == buf2) then do something
What are the problems with this?
We had to use the length variable 7 (or constant in this case) twice. Which is somewhere between "not ideal" and "potentially error prone".
We had to access the contents of buf using .data() which I shall assume here is implemented to return a raw pointer of some sort. I don't personally mind this too much, but others may prefer a more memory-safe solution, perhaps hinting we should use an iterator of some sort? I think in Visual Studio (for Windows users which I am not) then this may return an iterator anyway, which will give [?] warnings/errors [?] - not sure on this.
We had to have an additional resize statement for buf. It would be better if the size of buf could be automatically set somehow.
It is undefined behavior to write into the const char* returned by std::string::data(). However, you are free to use std::vector::data() in this way.
If you want to use std::string, and dislike setting the size yourself, you may consider whether you can use std::getline(). This is the free function, not std::istream::getline(). The std::string version will read up to a specified delimiter, so if you have a text format you can tell it to read until '\0' or some other character which will never occur, and it will automatically resize the given string to hold the contents.
If your file is binary in nature, rather than text, I think most people would find std::vector<char> to be a more natural fit than std::string anyway.
We had to use the length variable 7 (or constant in this case) twice.
Which is somewhere between "not ideal" and "potentially error prone".
The second time you can use buf.size()
iofile.read(buf.data(), buf.size());
We had to access the contents of buf using .data() which I shall
assume here is implemented to return a raw pointer of some sort.
And pointed by John Zwinck, .data() return a pointer to const.
I suppose you could define buf as std::vector<char>; for vector (if I'm not wrong) .data() return a pointer to char (in this case), not to const char.
size() and resize() are working in the same way.
We had to have an additional resize statement for buf. It would be
better if the size of buf could be automatically set somehow.
I don't think read() permit this.
p.s.: sorry for my bad English.
We can validate a signature without double buffering (rdbuf and a string) and allocating from the heap...
// terminating null not included
constexpr char sig[] = { 'M', 'Y', 'F', 'I', 'L', 'E', 'T' };
auto ok = all_of(begin(sig), end(sig), [&fs](char c) { return fs.get() == (int)c; });
if (ok) {}
template<class Src>
std::string read_string( Src& src, std::size_t count){
std::string buf;
buf.resize(count);
src.read(&buf.front(), 7); // in C++17 make it buf.data()
return buf;
}
Now auto read = read_string( iofile, 7 ); is clean at point of use.
buf2 is a bad plan. I'd do:
if(read=="MYFILET")
directly, or use a const char myfile_magic[] = "MYFILET";.
I liked many of the ideas from the examples above, however I wasn't completely satisfied that there was an answer which would produce undefined-behaviour-free code for C++11 and C++17. I currently write most of my code in C++11 - because I don't anticipate using it on a machine in the future which doesn't have a C++11 compiler.
If one doesn't, then I add a new compiler or change machines.
However it does seem to me to be a bad idea to write code which I know may not work under C++17... That's just my personal opinion. I don't anticipate using this code again, but I don't want to create a potential problem for myself in the future.
Therefore I have come up with the following code. I hope other users will give feedback to help improve this. (For example there is no error checking yet.)
std::string
fstream_read_string(std::fstream& src, std::size_t n)
{
char *const buffer = new char[n + 1];
src.read(buffer, n);
buffer[n] = '\0';
std::string ret(buffer);
delete [] buffer;
return ret;
}
This seems like a basic, probably fool-proof method... It's a shame there seems to be no way to get std::string to use the same memory as allocated by the call to new.
Note we had to add an extra trailing null character in the C-style string, which is sliced off in the C++-style std::string.
Ok, strncpy is not designed to work with NULL terminated strings - it's not designed for NULL terminated strings (if dest is too short it won't be terminated by NULL and if dest is longer it will be padded by zero's).
So, here is a trivial code:
const char *src = ....; // NULL terminated string of unknown length
char dest[30];
How to src to dest? strcpy is not safe, strncpy is bad choice too. So I left with strlen followed by memcpy?
I suppose solution will differ a bit whenever I care care dest won't be truncated (dest is smaller than length of src) or not.
Some limitations:
Legacy code, so I don't want and can not to change it to std::string
I don't have strlcpy - gcc doesn't supply it.
Code may be used in parts of application where performance is critical (e.g. I don't want to wast CPU time padding with zeros dest as strncpy does). However, I'm not talking about premature optimization, but rather the idiotic way to perform string copying in C-way.
Edit
Oopps, I meant strncpy and not snprintf. My mistake
With strncpy:
strncpy(dest, src, sizeof(dest) - 1);
dest[sizeof(dest) - 1] = '\0';
This pads with zeros, but does much less formatting work than snprintf. If you really must have the computer do as little as possible, describe it yourself:
char* last = dest + sizeof(dest) - 1;
char* curr = dest; /* assuming we must not alter 'dest' */
while (curr != last && *src) { *curr++ = *src++; }
*last = '\0'; /* avoids a branch, but always writes.
If branch prediction is working well and the text normally fits:
if (curr == last) { *curr = '\0'; } */
I think you're talking about strncpy(), which might not terminate the string and will fill the remainder of the buffer with zeros.
snprintf() always terminates the destination string (as long as thebuffer has a size of at least 1) and doesn't pad the remainder of the buffer with zeros.
In summary, snprintf() is what you want, except you're very concerned about performance. Since snprintf() needs to interpret the format string (even if all it ends up doing is copying a string), you might be better off with something like strlcpy() for bounded string copy operations.
(and if you want strlcpy() but don't have it, you can get the rather simple source here. For completeness, strlcat() is here)
If you don't care about truncation, you can use strncat():
dest[0] = 0;
strncat(dest, src, sizeof dest - 1);
I'd just roll my own:
for (int i = 0; i < (sizeof(dest) - 1) && src[i] != NULL; i++)
{
dest[i] = src[i];
}
dest[i] = NULL;
This ensures that dest is null-terminated, but never adds more nulls than necessary. If you're really performance-sensitive, you can declare this as a macro or an inline function in a common header.
Use snprintf. It always null-terminates and does not do any null padding. Don't know where you got the misconceptions about it...
std::copy(src, src+strlen(src)+1, dest)
I'm not sure I understand your question entirely, but if you're concerned about zero-padding it can often be done pretty efficiently if you initialize your array like this.
char dest[30] = { 0 };
If you initialize it like that you don't have to care about extra logic to add '\0' to the end of the string and it might even turn out faster.
But if you're going to optimize remember to measure the performance before and after the optimization. Otherwise always code with readability and maintainability in mind.
OWASP says:
"C library functions such as strcpy
(), strcat (), sprintf () and vsprintf
() operate on null terminated strings
and perform no bounds checking."
sprintf writes formatted data to string
int sprintf ( char * str, const char * format, ... );
Example:
sprintf(str, "%s", message); // assume declaration and
// initialization of variables
If I understand OWASP's comment, then the dangers of using sprintf are that
1) if message's length > str's length, there's a buffer overflow
and
2) if message does not null-terminate with \0, then message could get copied into str beyond the memory address of message, causing a buffer overflow
Please confirm/deny. Thanks
You're correct on both problems, though they're really both the same problem (which is accessing data beyond the boundaries of an array).
A solution to your first problem is to instead use std::snprintf, which accepts a buffer size as an argument.
A solution to your second problem is to give a maximum length argument to snprintf. For example:
char buffer[128];
std::snprintf(buffer, sizeof(buffer), "This is a %.4s\n", "testGARBAGE DATA");
// std::strcmp(buffer, "This is a test\n") == 0
If you want to store the entire string (e.g. in the case sizeof(buffer) is too small), run snprintf twice:
int length = std::snprintf(nullptr, 0, "This is a %.4s\n", "testGARBAGE DATA");
++length; // +1 for null terminator
char *buffer = new char[length];
std::snprintf(buffer, length, "This is a %.4s\n", "testGARBAGE DATA");
(You can probably fit this into a function using va or variadic templates.)
Both of your assertions are correct.
There's an additional problem not mentioned. There is no type checking on the parameters. If you mismatch the format string and the parameters, undefined and undesirable behavior could result. For example:
char buf[1024] = {0};
float f = 42.0f;
sprintf(buf, "%s", f); // `f` isn't a string. the sun may explode here
This can be particularly nasty to debug.
All of the above lead many C++ developers to the conclusion that you should never use sprintf and its brethren. Indeed, there are facilities you can use to avoid all of the above problems. One, streams, is built right in to the language:
#include <sstream>
#include <string>
// ...
float f = 42.0f;
stringstream ss;
ss << f;
string s = ss.str();
...and another popular choice for those who, like me, still prefer to use sprintf comes from the boost Format libraries:
#include <string>
#include <boost\format.hpp>
// ...
float f = 42.0f;
string s = (boost::format("%1%") %f).str();
Should you adopt the "never use sprintf" mantra? Decide for yourself. There's usually a best tool for the job and depending on what you're doing, sprintf just might be it.
Yes, it is mostly a matter of buffer overflows. However, those are quite serious business nowdays, since buffer overflows are the prime attack vector used by system crackers to circumvent software or system security. If you expose something like this to user input, there's a very good chance you are handing the keys to your program (or even your computer itself) to the crackers.
From OWASP's perspective, let's pretend we are writing a web server, and we use sprintf to parse the input that a browser passes us.
Now let's suppose someone malicious out there passes our web browser a string far larger than will fit in the buffer we chose. His extra data will instead overwrite nearby data. If he makes it large enough, some of his data will get copied over the webserver's instructions rather than its data. Now he can get our webserver to execute his code.
Your 2 numbered conclusions are correct, but incomplete.
There is an additional risk:
char* format = 0;
char buf[128];
sprintf(buf, format, "hello");
Here, format is not NULL-terminated. sprintf() doesn't check that either.
Your interpretation seems to be correct. However, your case #2 isn't really a buffer overflow. It's more of a memory access violation. That's just terminology though, it's still a major problem.
The sprintf function, when used with certain format specifiers, poses two types of security risk: (1) writing memory it shouldn't; (2) reading memory it shouldn't. If snprintf is used with a size parameter that matches the buffer, it won't write anything it shouldn't. Depending upon the parameters, it may still read stuff it shouldn't. Depending upon the operating environment and what else a program is doing, the danger from improper reads may or may not be less severe than that from improper writes.
It is very important to remember that sprintf() adds the ASCII 0 character as string terminator at the end of each string. Therefore, the destination buffer must have at least n+1 bytes (To print the word "HELLO", a 6-byte buffer is required, NOT 5)
In the example below, it may not be obvious, but in the 2-byte destination buffer, the second byte will be overwritten by ASCII 0 character. If only 1 byte was allocated for the buffer, this would cause buffer overrun.
char buf[3] = {'1', '2'};
int n = sprintf(buf, "A");
Also note that the return value of sprintf() does NOT include the null-terminating character. In the example above, 2 bytes were written, but the function returns '1'.
In the example below, the first byte of class member variable 'i' would be partially overwritten by sprintf() (on a 32-bit system).
struct S
{
char buf[4];
int i;
};
int main()
{
struct S s = { };
s.i = 12345;
int num = sprintf(s.buf, "ABCD");
// The value of s.i is NOT 12345 anymore !
return 0;
}
I pretty much have stated a small example how you could get rid of the buffer size declaration for the sprintf (if you intended to, of course!) and no snprintf envolved ....
Note: This is an APPEND/CONCATENATION example, take a look at here