This is concerning strings in C++. I have not touched C/C++ for a very long time; infact I did programming in those languages only for the first year in my college, about 7 years ago.
In C to hold strings I had to create character arrays(whether static or dynamic, it is not of concern). So that would mean that I need to guess well in advance the size of the string that an array would contain. Well I applied the same approach in C++. I was aware that there was a std::string class but I never got around to use it.
My question is that since we never declare the size of an array/string in std::string class, does a buffer overflow occur when writing to it. I mean, in C, if the array’s size was 10 and I typed more than 10 characters on the console then that extra data would be writtein into some other object’s memory place, which is adjacent to the array. Can a similar thing happen in std::string when using the cin object.
Do I have to guess the size of the string before hand in C++ when using std::string?
Well! Thanks to you all. There is no one right answer on this page (a lot of different explanations provided), so I am not selecting any single one as such. I am satisfied with the first 5. Take Care!
Depending on the member(s) you are using to access the string object, yes. So, for example, if you use reference operator[](size_type pos) where pos > size(), yes, you would.
Assuming no bugs in the standard library implementation, no. A std::string always manages its own memory.
Unless, of course, you subvert the accessor methods that std::string provides, and do something like:
std::string str = "foo";
char *p = (char *)str.c_str();
strcpy(p, "blah");
You have no protection here, and are invoking undefined behaviour.
The std::string generally protects against buffer overflow, but there are still situations in which programming errors can lead to buffer overflows. While C++ generally throws an out_of_range exception when an operation references memory outside the bounds of the string, the subscript operator [] (which does not perform bounds checking) does not.
Another problem occurs when converting std::string objects to C-style strings. If you use string::c_str() to do the conversion, you get a properly null-terminated C-style string. However, if you use string::data(), which writes the string directly into an array (returning a pointer to the array), you get a buffer that is not null terminated. The only difference between c_str() and data() is that c_str() adds a trailing null byte.
Finally, many existing C++ programs and libraries have their own string classes. To use these libraries, you may have to use these string types or constantly convert back and forth. Such libraries are of varying quality when it comes to security. It is generally best to use the standard library (when possible) or to understand the semantics of the selected library. Generally speaking, libraries should be evaluated based on how easy or complex they are to use, the type of errors that can be made, how easy these errors are to make, and what the potential consequences may be.
refer https://buildsecurityin.us-cert.gov/bsi/articles/knowledge/coding/295-BSI.html
In c the cause is explained as follow:
void function (char *str) {
char buffer[16];
strcpy (buffer, str);
}
int main () {
char *str = "I am greater than 16 bytes"; // length of str = 27 bytes
function (str);
}
This program is guaranteed to cause unexpected behavior, because a string (str) of 27 bytes has been copied to a location (buffer) that has been allocated for only 16 bytes. The extra bytes run past the buffer and overwrites the space allocated for the FP, return address and so on. This, in turn, corrupts the process stack. The function used to copy the string is strcpy, which completes no checking of bounds. Using strncpy would have prevented this corruption of the stack. However, this classic example shows that a buffer overflow can overwrite a function's return address, which in turn can alter the program's execution path. Recall that a function's return address is the address of the next instruction in memory, which is executed immediately after the function returns.
here is a good tutorial that can give your answer satisfactory.
In C++, the std::string class starts out with a minimum size (or you can specify a starting size). If that size is exceeded, std::string allocates more dynamic memory.
Assuming the library providing std::string is correctly written, you cannot cause a buffer overflow by adding characters to a std::string object.
Of course, bugs in the library are not impossible.
"Do buffer overflows occur in C++ code?"
To the extent that C programs are legal C++ code (they almost all are), and C programs have buffer overflows, C++ programs can have buffer overflows.
Being richer than C, I'm sure C++ can have buffer overflows in ways that C cannot :-}
Related
It seems like standard programming practice and the POSIX standard are at odds with each other. I'm working with a program and I noticed that I see a lot of stuff like:
char buf[NAME_MAX + 1]
And I'm also seeing that a lot of operating systems don't define NAME_MAX and say that that they technically don't have to according to POSIX because you're supposed to use pathconf to get the value it's configured to at runtime rather than hard-coding it as a constant anyway.
The problem is that the compiler won't let me use pathconf this way with arrays. Even if I try storing the result of pathconf in a const int, it still throws a fit and says it has to be a constant. So it looks like in order to actually use pathconf, I would have to avoid using an array of chars for the buffer here because that apparently isn't good enough. So I'm caught between a rock and a hard place, because the C++ standard seemingly won't allow me to do what POSIX says I must do, that is determine the size of a character buffer for a filename at runtime rather than compile time.
The only information I've been able to find on this suggests that I would need to replace the array with a vector, but it's not clear how I would do it. When I test using a simple program, I can get this to work:
std::vector<char> buf((pathconf("/", _PC_NAME_MAX) + 1));
And then I can figure out the size by calling buf.size() or something. But I'm not sure if this is the right approach at all. Does anyone have any experience with trying to get a program to stop depending on constants like NAME_MAX or MAXNAMLEN being defined in the system headers and getting the implementation to use pathconf at runtime instead?
Halfway measures do tend to result in conflicts of some sort.
const usigned NAME_MAX = /* get the value at runtime */;
char buf[NAME_MAX + 1];
The second line declares a C-style array (presumably) intended to hold a C-style string. In C, this is fine. In C++, there is an issue because the value of NAME_MAX is not known at compile time. That's why I called this a halfway measure—there is a mix of C-style code and C++ compiling. (Some compilers will allow this in C++. Apparently yours does not.)
The C++ approach would use C++-style strings, as in:
std::string buf;
That's it. The size does not need to be specified since memory will be allocated as needed, provided you avoid C-style interfaces. Use streaming (>>) when reasonable. If the buffer is being filled by user or file input, this should be all you need.
If you need to use C-style strings (perhaps this buffer is being filled by a system call written for C?), there are a few options for allocating the needed space. The simplest is probably a vector, much like you were thinking.
std::vector<char> buf{NAME_MAX + 1};
system_call(buf.data()); // Send a char* to the system call.
Alternatively, you could use a C++-style string, which could make manipulating the data more convenient.
std::string buf{NAME_MAX + 1, '\0'};
system_call(buf.data()); // Send a char* to the system call.
There is also a smart pointer option, but the vector approach might play nicer with existing code written for a C-style array.
Suppose you have a char buffer that you want to copy an std::string into. Are there consequences of copying extra data into the buffer, outside of the strings scope, even though the buffer has adequate size?
Example
std::string my_string = "hello";
char my_buffer[128];
memset(my_buffer, 0, 128);
strncpy(my_buffer, my_string.c_str(), 128);
So "hello" gets copied into my_buffer, but so will 123 other bytes of data that comes after my_string. Are there any consequences of this? Is it harmful for the buffer to hold this other data?
but so will 123 other bytes of data that comes after my_string
This assumption is incorrect: strncpy pays attention to null termination of the source string, never reading past null terminator. The remaining data will be set to '\0' characters:
destination is padded with zeros until a total of num characters have been written to it. [reference]
This is in contrast to memcpy, which requires both the source and the destination to be of sufficient size in order to avoid undefined behavior.
OK, let's assume what you wanted is:
strncpy(my_buffer, my_string.c_str(), 128);
Thsi is always a 0-terminated string by definition, so considering:
Copies at most count characters of the character array pointed to by src (including the terminating null character, but not any of the characters that follow the null character) to character array pointed to by dest.
You won't get anything copied after "hello" from the original string, the rest will be 0s:
If, after copying the terminating null character from src, count is not reached, additional null characters are written to dest until the total of count characters have been written.
According to strncpy() description here 1, the copy is done up to the length you provided, for null terminated string, so that when end of the string come before, like in this case, copy is done up to it and no more copy is done, so rest of the "123 bytes" are not copied, and the copy loop terminates
The other answers to this question have addressed what happens with strncpy() (i.e. it will copy your string correctly because it stops at the 0-terminator byte), so perhaps what gets more to the intent of the question would be, what if you had this instead?
memcpy(my_buffer, my_string.c_str(), 128);
In this case, the function (memcpy()) doesn't know about 0-terminated-string semantics, and will always just blindly copy 128 bytes (starting at the address returned by my_string.c_str()) to the address my_buffer. The first 6 (or so) of those bytes will be from my_string's internal buffer, and the remaining bytes will be from whatever happens to be in memory after that.
So the question is, what happens then? Well, this memcpy() call reads from "mystery memory" whose purpose you're not aware of, so you're invoking undefined behavior by doing that, and therefore in principle anything could happen. In practice, the likely result is that your buffer will contain a copy of whatever bytes were read (although you probably won't notice them, since you'll be using string functions that don't look past the 0/terminator byte in your array anyway).
There is a small chance, though, that the "extra" memory bytes that memcpy() read could be part of a memory-page that is marked as off-limits, in which case trying to read from that page would likely cause a segmentation fault.
And finally, there's the real bugaboo of undefined behavior, which is that your C++ compiler's optimizer is allowed to do all kinds of crazy modifications to your code's logic, in the name of making your program more efficient -- and (assuming the optimizer isn't buggy) all of those optimizations will still result in the program running as intended -- as long as the program follows the rules and doesn't invoke undefined behavior. Which is to say, if your program invokes undefined behavior in any way, the optimizations may be applied in ways that are very difficult to predict or understand, resulting in bizarre/unexpected behavior in your program. So the general rule is, avoid undefined behavior like the plague, because even if you think it "should be harmless", there's a very real possibility that it will end up doing things you wouldn't expect it to do, and then you're in for a long, painful debugging session as you try to figure out what's going on.
Assuming you mean definitely copying that data, as your current code would end at a null terminator.
Essentially, no. Whatever data is in there will be used only as string data so unless you then tried to do something weird with that spare data (try to put a function pointer to it and execute it etc.) then it's basically safe.
The issue is with initially copying random data past the end of your original string, that could overflow into protected data that you don't have access to and throw a segfault(Inaccessible memory exception)
Function strncpy will copy everything up to NUL character or specified size. That's it.
In addition to other answers please remember that strncpy does not NUL terminate the dst buffer when the src string length (excluding NUL) is equal to buffer's size given. You should always do something like my_buffer[127]=0 to prevent buffer overruns when handling the string later on. In libbsd there is strlcpy defined that always NUL terminate the buffer.
Check out this snippet:
#include <string.h>
int main(int argc, char ** argv)
{
char buf[8];
char *str = "deadbeef";
strlcpy(buf, str, sizeof(buf));
printf("%s\n", buf);
strncpy(buf, str, sizeof(buf));
printf("%s\n", buf);
return 0;
}
To see the problem compile it with flag -fsanitize=address. The second buffer is not NUL terminated!
std::string str = "hello world!";
char dest[16];
memcpy(dest, str.c_str(), 16);
str.c_str() will return a null-terminated char array. However, if I call memcpy with a count greater than 13, what would happen? Is dest going to be a null-terminated char array? Is there anything I need to be cautious about?
Your code has undefined behaviour. When using memcpy, you need to make sure that the number of bytes being copied is not greater than the min(size_of_recepient, size_of_source).
In your case, the size of your source is 13 bytes, so copying more than that is not ok.
Is dest going to be a null-terminated char array? Is there anything I
need to be cautious about?
Unfortunately, the first parts of dest will be a null-terminated character array since str.c_str() returns a null terminated character array... But the remaining parts of dest will surely contain some additional garbage.
You are accessing additional memory that you have no idea about... Your code could reformat your PC ...its called Undefined behavior,
If you give it a count that's exactly how many sequential bytes will be copied. This means that all those bytes are read from after your string if you pass in a count larger than your actual source data. If this memory does not belong to your program to use, this might cause a crash due to a Segmentation Fault.
strcpy works differently in that sense since it checks for '\0'. Also if you want to copy bytes from a std::string to a char[], you could simply not rely on dangerous C functions and use std::copy.
It's not safe. You should use strncpy, which was made for this. What's after the end of the "hello world!" std::string is undefined. It could be an undefined part of your heap, in which case you'll be copying what might be called garbage, or it could be venturing into an unallocated memory page, in which case you'll get a segfault and your program dies. (Unless it has some clever magic to handle segfaults, which, judging by your question, it probably does not have.)
char ch[3];
strcpy(ch,"Hello");
cout<<ch<<" "<<sizeof(ch);
Q1. Why is it printing "Hello" when the size of ch is 3 ?
Q2. If ch is converted from ch[2] to ch[6] (by Implicit Type Conversion) then why is sizeof(ch) still 3 ?
strcpy does not do a size check, so it simply writes "Hello\0" at memory address of a, overwriting whatever is located after a in memory.
There are no conversions performed, ch is char[3] as it was before strcpy.
I am assuming when you typed char a[3]; you meant char ch[3];.
A1: When you declare an array of some size, it's up to you to not exceed it. If you do exceed it as in this case, you get undefined results.
A2: You cannot convert it like this so it's a non-issue.
You are doing something really dangerous there. This is one of the reasons to use std::string you don't have to check if you go out of bounds for simple copying.
Question 1:
Like others said already, it printing out "Hello" is undefined behavior and anything can happen. But that's not all. Writing out of bounds like you do in your code is dangerous. You basically got a Buffer Overflow, if you look at the example on Wikipedia you'll see that it's nearly the same code you got.
Question 2:
There is no conversion. When you call strcpy(ch, "Hello"); your array ch decays into a pointer. There is no way for strcpy to know how many chars it can safely copy there, since a pointer is nothing more than a memory address. It doesn't know how much memory is available at the location so simply writes everything even if (like in your case) you go out of bounds.
Things like you got in your code is one of the reasons why pointers are being called unsafe.
I assume this is a common way to use sprintf:
char pText[x];
sprintf(pText, "helloworld %d", Count );
but what exactly happens, if the char pointer has less memory allocated, than it will be print to?
i.e. what if x is smaller than the length of the second parameter of sprintf?
i am asking, since i get some strange behaviour in the code that follows the sprintf statement.
It's not possible to answer in general "exactly" what will happen. Doing this invokes what is called Undefined behavior, which basically means that anything might happen.
It's a good idea to simply avoid such cases, and use safe functions where available:
char pText[12];
snprintf(pText, sizeof pText, "helloworld %d", count);
Note how snprintf() takes an additional argument that is the buffer size, and won't write more than there is room for.
This is a common error and leads to memory after the char array being overwritten. So, for example, there could be some ints or another array in the memory after the char array and those would get overwritten with the text.
See a nice detailed description about the whole problem (buffer overflows) here. There's also a comment that some architectures provide a snprintf routine that has a fourth parameter that defines the maximum length (in your case x). If your compiler doesn't know it, you can also write it yourself to make sure you can't get such errors (or just check that you always have enough space allocated).
Note that the behaviour after such an error is undefined and can lead to very strange errors. Variables are usually aligned at memory locations divisible by 4, so you sometimes won't notice the error in most cases where you have written one or two bytes too much (i.e. forget to make place for a NUL), but get strange errors in other cases. These errors are hard to debug because other variables get changed and errors will often occur in a completely different part of the code.
This is called a buffer overrun.
sprintf will overwrite the memory that happens to follow pText address-wise. Since pText is on the stack, sprintf can overwrite local variables, function arguments and the return address, leading to all sorts of bugs. Many security vulnerabilities result from this kind of code — e.g. an attacker uses the buffer overrun to write a new return address pointing to his own code.
The behaviour in this situation is undefined. Normally, you will crash, but you might also see no ill effects, strange values appearing in unrelated variables and that kind of thing. Your code might also call into the wrong functions, format your hard-drive and kill other running programs. It is best to resolve this by allocating more memory for your buffer.
I have done this many times, you will receive memory corruption error. AFAIK, I remember i have done some thing like this:-
vector<char> vecMyObj(10);
vecMyObj.resize(10);
sprintf(&vecMyObj[0],"helloworld %d", count);
But when destructor of vector is called, my program receive memory corruption error, if size is less then 10, it will work successfully.
Can you spell Buffer Overflow ? One possible result will be stack corruption, and make your app vulnerable to Stack-based exploitation.