So I was working in c++ 11 and this occurred to me:
char s[100];
strcpy(s, "Andrei");
int n=strlen(Andrei); // (it would be 6)
I do this:
s[n]=' ';
s[n+9]='\0';
What happens with s[n+1], ... s[n+8]?
If I go
cout <<s;
The array will be displayed as normal.
But if I specify
cout <<s[n+2]
it will print odd characters
The array elements s[n+1] ... s[n+8] are uninitialized. They are said to have indeterminate value. Trying to output an indeterminate value causes undefined behaviour which means anything can happen.
So, you have a buffer of 100 chars. Due the fact there is no strict initialization policy in C++, you got a buffer of BIG SOMETHING. Content of that "big something" would differ between each app launch, may vary on different build modes (debug/release) etc. Indeed, querying that "something" is like trying to listen a radio on a random wave and hear a noise sequence in return (it could be different each time you do this).
Running your strcpy would initialize just first N chars, the rest of the buffer is still "something".
Related
Suppose you have a char buffer that you want to copy an std::string into. Are there consequences of copying extra data into the buffer, outside of the strings scope, even though the buffer has adequate size?
Example
std::string my_string = "hello";
char my_buffer[128];
memset(my_buffer, 0, 128);
strncpy(my_buffer, my_string.c_str(), 128);
So "hello" gets copied into my_buffer, but so will 123 other bytes of data that comes after my_string. Are there any consequences of this? Is it harmful for the buffer to hold this other data?
but so will 123 other bytes of data that comes after my_string
This assumption is incorrect: strncpy pays attention to null termination of the source string, never reading past null terminator. The remaining data will be set to '\0' characters:
destination is padded with zeros until a total of num characters have been written to it. [reference]
This is in contrast to memcpy, which requires both the source and the destination to be of sufficient size in order to avoid undefined behavior.
OK, let's assume what you wanted is:
strncpy(my_buffer, my_string.c_str(), 128);
Thsi is always a 0-terminated string by definition, so considering:
Copies at most count characters of the character array pointed to by src (including the terminating null character, but not any of the characters that follow the null character) to character array pointed to by dest.
You won't get anything copied after "hello" from the original string, the rest will be 0s:
If, after copying the terminating null character from src, count is not reached, additional null characters are written to dest until the total of count characters have been written.
According to strncpy() description here 1, the copy is done up to the length you provided, for null terminated string, so that when end of the string come before, like in this case, copy is done up to it and no more copy is done, so rest of the "123 bytes" are not copied, and the copy loop terminates
The other answers to this question have addressed what happens with strncpy() (i.e. it will copy your string correctly because it stops at the 0-terminator byte), so perhaps what gets more to the intent of the question would be, what if you had this instead?
memcpy(my_buffer, my_string.c_str(), 128);
In this case, the function (memcpy()) doesn't know about 0-terminated-string semantics, and will always just blindly copy 128 bytes (starting at the address returned by my_string.c_str()) to the address my_buffer. The first 6 (or so) of those bytes will be from my_string's internal buffer, and the remaining bytes will be from whatever happens to be in memory after that.
So the question is, what happens then? Well, this memcpy() call reads from "mystery memory" whose purpose you're not aware of, so you're invoking undefined behavior by doing that, and therefore in principle anything could happen. In practice, the likely result is that your buffer will contain a copy of whatever bytes were read (although you probably won't notice them, since you'll be using string functions that don't look past the 0/terminator byte in your array anyway).
There is a small chance, though, that the "extra" memory bytes that memcpy() read could be part of a memory-page that is marked as off-limits, in which case trying to read from that page would likely cause a segmentation fault.
And finally, there's the real bugaboo of undefined behavior, which is that your C++ compiler's optimizer is allowed to do all kinds of crazy modifications to your code's logic, in the name of making your program more efficient -- and (assuming the optimizer isn't buggy) all of those optimizations will still result in the program running as intended -- as long as the program follows the rules and doesn't invoke undefined behavior. Which is to say, if your program invokes undefined behavior in any way, the optimizations may be applied in ways that are very difficult to predict or understand, resulting in bizarre/unexpected behavior in your program. So the general rule is, avoid undefined behavior like the plague, because even if you think it "should be harmless", there's a very real possibility that it will end up doing things you wouldn't expect it to do, and then you're in for a long, painful debugging session as you try to figure out what's going on.
Assuming you mean definitely copying that data, as your current code would end at a null terminator.
Essentially, no. Whatever data is in there will be used only as string data so unless you then tried to do something weird with that spare data (try to put a function pointer to it and execute it etc.) then it's basically safe.
The issue is with initially copying random data past the end of your original string, that could overflow into protected data that you don't have access to and throw a segfault(Inaccessible memory exception)
Function strncpy will copy everything up to NUL character or specified size. That's it.
In addition to other answers please remember that strncpy does not NUL terminate the dst buffer when the src string length (excluding NUL) is equal to buffer's size given. You should always do something like my_buffer[127]=0 to prevent buffer overruns when handling the string later on. In libbsd there is strlcpy defined that always NUL terminate the buffer.
Check out this snippet:
#include <string.h>
int main(int argc, char ** argv)
{
char buf[8];
char *str = "deadbeef";
strlcpy(buf, str, sizeof(buf));
printf("%s\n", buf);
strncpy(buf, str, sizeof(buf));
printf("%s\n", buf);
return 0;
}
To see the problem compile it with flag -fsanitize=address. The second buffer is not NUL terminated!
If we initialize array with 4 elements, for example:
int array[4];
Can we assign values like this because it is also taking 4 values:
for(int i=5;i<9;i++){
cin>>array[i];
}
Out-of-bounds access on an array has undefined behaviour, which is another way of saying "unintended consequences":
int a[4];
int b[4];
for(int i=5;i<9;i++){
a[i] = i;
}
In a debugger watch what it's doing and in particular watch what happens to b.
This may or may not crash, but it's still broken code. C++ will not always alert you to such situations, it's your responsibility as a developer to be aware of what's allowed and not allowed when accessing certain structures.
Accessing an array out of bounds doesn't always cause a crash, but it is always problematic. Try with i = 999999 or i = -9 and see what happens.
The problem with undefined behaviour is that it may appear to be working but these unintended consequences eventually catch up with you. This makes debugging your code very difficult as this out-of-bounds write may stomp a variable that you need somewhere else minutes or hours after the initial mistake, and then your program crashes. Those sorts of bugs are the most infuriating to fix since the time between the cause and effect is extremely long.
It's the same as how throwing lit matches in the garbage may not cause a fire every time but when it does cause a fire you may not notice until it's too late. In C++ you must be extremely vigilant about not introducing undefined behaviour into your code.
You are mixing up two constructs - the logic used to iterate in a loop and the indexing of an array.
You may use
for(int i=5;i<9;i++){
...
}
to run the loop four times. However, you many not use those values of i to access the array. The array index has to be offset appropriately so it is valid.
for(int i=5;i<9;i++){
int index = i - 5;
std::cin >> array[index];
}
No, you get an array with 4 slots. Those slots are
array[0]
array[1]
array[2]
array[3]
So your code is incorrect. It might seem to work, but its still wrong due to whats called Undefined Behavior, next week it might fail
Note. You are better off using std::vector in c++
can we allocate value like this because it is also taking 4 values:
for(int i=5;i<9;i++){
cin>>array[i];
No, we can't. Since the size of the array is 4, the only indices that can be accessed are 0,1,2 and 3. Accessing any other index will have undefined behaviour.
try to run it in any online compiler it is working.
The behaviour is undefined. Possible behaviours include, none of which are guaranteed:
- working
- not working
- random output
- non-random output
- the expected output
- unexpected output
- no output
- any output
- crashing at random
- crashing always
- not crashing
- corruption of data
- different behaviour, when executed on another system
- , when compiled with another compiler
- , on tuesday
- , only when you are not looking
- same behaviour in all of the above cases
- anything else within the power of the computer (hopefully limited by the OS)
There is something that has been bugging me for a while and I need an anwer for it,
char *p = "hello world";
p="wazzup";
p="Hey";
Here I declare an pointer to point to a string (or in other words I made a string by using a pointer)
I have had some strange results with this that i normally wouldnt have gotten if I used an char array string
cout <<p<< endl; //"Hey" Gets printer
cout <<p+8<< endl; // I kept adding numbers till "wazzup" got printed
cout <<p+29<< endl; // No matter how much I increment, I cant print "Hello World"
So my question is:
When I change the value that a char pointer is pointing to. Does it
overwrite the original Data like it would do with char array;
or it creates a new string right before it in the memory and points to it;
or does it add the new string at the begining of the old one(including null);
or does it create a new string in a new place in the memory and I was able to print "wazzup" only by chance
It does none of the options above. Changing the value of a pointer merely changes to the address in memory it points to. In each case of the assignments to p, it is set to point to the first character of a (different) string literal - which is stored in memory.
The behaviour of using a pointer that points beyond the end a string literal such as
cout <<p+8<< endl
is undefined. This is why using pointers is fraught with danger.
The behaviour you are seeing is implementation dependant: The compiler stores the strings literals adjacent in memory, so running off the end of one runs into another. Your program might equally have crashed when compiled with a different compiler.
By adding to the pointer, you are increasing the address value... so while printing it will print the value stored at that memory location and nothing else...
If you could print
"hello world"
"wazzup"
that will be a fluke :)
All three of these are string literal constants. They appear directly in your executable's binary, and each time you assign p, you point to these locations in memory. They are completely separate memory; reassigning p to another one does not modify any string data.
You are only making the pointer point to something else each time you assign to it. So there is no scope for data being overwritten anywhere. What happens under the hood is implementation dependent, so what you see is just by pure chance. When you do this:
cout <<p+8<< endl;
you are going beyond the bounds of the string literal, invoking undefined behaviour.
When I change the value that a char pointer is pointing to. Does it
-overwrite the original Data like it would do with char array.
No, the data part of your address space contains all the three strings "Hello World", "wazzup", and "Hey". When you change the pointer, you are just changing the value in p to the starting address of either of above strings. Changing the address a pointer is pointing to and changing the value a pointer is pointing to are two different things.
-or it creates a new string right before it in the memory and points to it.
The compiler creates the strings(character bytes) at the compile time and NOT at run-time.
-or does it create a new string in a new place in the memory and I was able to print "wazzup" only by chance
I think above answer covers this question.
-or does it add the new string at the begining of the old one.(including null)
It depends on the compiler specification.
The three strings are not located in same memory position. This three memory may be sequential or different location.If compiler allocation three memory then you find it using +/- some value. It totally depends on compiler. In c you can ready any memory location so you never get any error while +/- then pointer p.
Since they 3 are not same strings, they are located at different parts of memory. I think they could be separated by 64 byte aligning. try p+64 :)
Only same strings sit in the same location of memory and only if compiler supports it.
There is a probability "wazzup" at p+64 and "hey" at p+128(if you use VC++ 2010 express and you have pentium-m cpu and you use windows xp sp-3)
cout <<*(p+64)<< endl;
cout <<*(p+128)<< endl;
//SECTION I:
void main()
{
char str[5] = "12345"; //---a)
char str[5] = "1234"; //---b)
cout<<"String is: "<<str<<endl;
}
Output: a) Error: Array bounds Overflow.
b) 1234
//SECTION II:
void main()
{
char str[5];
cout<<"Enter String: ";
cin>>str;
cout<<"String is: "<<str<<endl;
}
I tried with many different input strings, and to my surprise, I got strange result:
Case I: Input String: 1234, Output: 1234 (No issue, as this is expected behavior)
Case II: Input String: 12345, Output: 12345 (NO error reported by compiler But I was expecting an Error: Array bounds Overflow.)
Case III: Input String: 123456, Output: 123456 (NO error reported by compiler But I was expecting an Error: Array bounds Overflow.)
.................................................
.................................................
Case VI: Input String: 123456789, Output: 123456789(Error: unhandeled exception. Access Violation.)
My doubt is, When I assigned more characters than its capacity in SECTION I, compiler reported ERROR: Array bounds Overflow.
But, when I am trying the same thing in SECTION II, I am not geting any errors. WHY it is so ?? Please note: I executed this on Visual Studio
char str[5] = "12345";
This is a compiletime error. You assign a string of length 6 (mind the appended null-termination) to an array of size 5.
char str[5];
cin>>str;
This may yield a runtime error. Depending on how long the string is you enter, the buffer you provide (size 5) may be too small (again mind the null-termination).
The compiler of course can't check your user input during runtime. If you're lucky, you're notified for an access violation like this by Segmentation Faults. Truly, anything can happen.
Throwing exceptions on access violations is not mandatory. To address this, you can implement array boundary checking yourself, or alternatively (probably better) there are container classes that adapt their size as necessary (std::string):
std::string str;
cin >> str;
What you are seeing is an undefined behavior. You are writing array out of bounds, anything might happen in that case ( (including seeing the output you expect).
I tried with many different input strings, and to my surprise, I got
strange result:
This phenomena is called Undefined behavior (UB).
As soon as you enter more characters than a char array can hold, you invite UB.
Sometime, it may work, it may not work sometime, it may crash. In short, there is no definite pattern.
[side note: If a compiler allows void main() to get compiled then it's not standard compliant.]
It's because in the second case, the compiler can't know. Only at runtime, there is a bug. C / C++ offer no runtime bounds checking by default, so the error is not recognized but "breaks" your program. However, this breaking doesn't have to show up immediately but while str points to a place in memory with only a fixed number of bytes reserved, you just write more bytes anyways.
Your program's behavior will be undefined. Usually, and also in your particular case, it may continue working and you'll have overwritten some other component's memeory. Once you write so much to access forbidden memory (or free the same memory twice), your program crashes.
char str[5] = "12345" - in this case you didn't leave room to the null terminator. So when the application tries to print str, it goes on and on as long as a null is not encountered, eventually stepping on foridden memory and crashing.
In the `cin case, the cin operation stuffed 0 at the end of the string, so this stops the program from going too far in memory.
Hoever, from my experience, when breaking the rules with memory overrun, things go crazy, and looking for a reason why in one case it works while in other it doesn't, doesn't lead anywhere. More often than not, the same application (with a memory overrun) can work on one PC and crash on another, due to different memory states on them.
because compiler does static check. In your section II, size is unknown in advance. It depends on your input length.
May i suggest you to use STL string ?
I assume this is a common way to use sprintf:
char pText[x];
sprintf(pText, "helloworld %d", Count );
but what exactly happens, if the char pointer has less memory allocated, than it will be print to?
i.e. what if x is smaller than the length of the second parameter of sprintf?
i am asking, since i get some strange behaviour in the code that follows the sprintf statement.
It's not possible to answer in general "exactly" what will happen. Doing this invokes what is called Undefined behavior, which basically means that anything might happen.
It's a good idea to simply avoid such cases, and use safe functions where available:
char pText[12];
snprintf(pText, sizeof pText, "helloworld %d", count);
Note how snprintf() takes an additional argument that is the buffer size, and won't write more than there is room for.
This is a common error and leads to memory after the char array being overwritten. So, for example, there could be some ints or another array in the memory after the char array and those would get overwritten with the text.
See a nice detailed description about the whole problem (buffer overflows) here. There's also a comment that some architectures provide a snprintf routine that has a fourth parameter that defines the maximum length (in your case x). If your compiler doesn't know it, you can also write it yourself to make sure you can't get such errors (or just check that you always have enough space allocated).
Note that the behaviour after such an error is undefined and can lead to very strange errors. Variables are usually aligned at memory locations divisible by 4, so you sometimes won't notice the error in most cases where you have written one or two bytes too much (i.e. forget to make place for a NUL), but get strange errors in other cases. These errors are hard to debug because other variables get changed and errors will often occur in a completely different part of the code.
This is called a buffer overrun.
sprintf will overwrite the memory that happens to follow pText address-wise. Since pText is on the stack, sprintf can overwrite local variables, function arguments and the return address, leading to all sorts of bugs. Many security vulnerabilities result from this kind of code — e.g. an attacker uses the buffer overrun to write a new return address pointing to his own code.
The behaviour in this situation is undefined. Normally, you will crash, but you might also see no ill effects, strange values appearing in unrelated variables and that kind of thing. Your code might also call into the wrong functions, format your hard-drive and kill other running programs. It is best to resolve this by allocating more memory for your buffer.
I have done this many times, you will receive memory corruption error. AFAIK, I remember i have done some thing like this:-
vector<char> vecMyObj(10);
vecMyObj.resize(10);
sprintf(&vecMyObj[0],"helloworld %d", count);
But when destructor of vector is called, my program receive memory corruption error, if size is less then 10, it will work successfully.
Can you spell Buffer Overflow ? One possible result will be stack corruption, and make your app vulnerable to Stack-based exploitation.