I'm pretty inexperienced in c++, and I wrote the following code to see how characters and strings work.
#include "stdio.h"
#include <iostream>
#include <string>
using namespace std;
int main()
{
char asdf[] = "hello";
char test[5] = {'h','e','l','l','o'};
cout << test;
}
I was expected it to output "hello", but instead I got "hellohello", which is really puzzling to me. I did some experimenting:
If I change the asdf to another string of a different length, it outputs "hello" normally.
If I change the amount of characters in test it outputs "hello" normally.
I thought this only happened when the two were the same length, but when I change them both to "hell" it seems to output "hell" normally.
To make things more confusing, when I asked a friend to run this code on their computer, it outputted "hello" and then a random character.
I'm running a fresh install of code blocks on Ubuntu. Anyone have any idea what is going on here?
This is undefined behaviour.
Raw char* or char[] strings in C and C++ must be NULL-terminated. That is, the string needs to end with a '\0' character. Your test[5] does not do that, so the function printing the output continues after the last o, because it is still looking for the NULL-termination.
Due to how the strings are stored on the stack (the stack usually grows towards lower addresses), the next bytes it encounters are those of asdf[], to which you assigned "hello". This is how the memory layout actually looks like, the arrow indicates the direction in which memory addresses (think pointers) increase:
---->
+-------------------
|hellohello\0 ...
+-------------------
\_ asdf
\_ test
Now in C++ and C, string literals like "hello" are NULL-terminated implicitly, so the compiler writes a hidden '\0' behind the end of the string. The output function continues to print the contents of asdf char-by-char until it reaches that hidden '\0' and then it stops.
If you were to remove the asdf, you would likely see a bit of garbage after the first hello and then a segmentation fault. But this is undefined behaviour, because you are reading out of the bounds of the test array. This also explains why it behaves differently on different systems: for example, some compilers may decide to lay out the variables in a different order on the stack, so that on your friends system, test is actually lower on the stack (remember, lower on the stack means at a higher address):
---->
+-------------------
|hello\0hello ...
+-------------------
\_ test
\_ asdf
Now when you print the contents of test, it will print hello char-by-char, then continue reading the memory until a \0 is found. The contents of ... are highly specific to architecture and runtime used, possibly even phase of the moon and time of day (not entirely serious), so that on your friends machine it prints a "random" character and stops then.
You can fix this by adding a '\0' or 0 to your test array (you will need to change the size to 6). However, using const char test[] = "hello"; is the sanest way to solve this.
You have to terminate your test array with an ascii 0 char. What happens now is that in memory it is adjacent to your asdf string, so since test isn't terminated, the << will just continue until it meets the ascii 0 at the end of asdf.
In case you wonder: When filling asdf, this ascii 0 is added automatically.
The reason for this is that C style strings need the null character to mark the end of the string.
As you have not put this into the array test it will just keep printing characters until it finds one. In you case the array asdf happens to follow test in memory - but this cannot be guaranteed.
Instead change the code to this:
char test[] = {'h','e','l','l','o', 0};
cout is printing all characters starting from the beginning of the given address (test here, or &test[0] in equivalent notation) up to the point where it finds a null terminator. As you haven't put a null terminator into the test array it will continue to print until it accidently finds one in memory. Up from this point it's pretty much undefined behavior what happens.
Last character should be '\0' to indicate end of string.
char test[6] = {'h','e','l','l','o','\0'};
Unless there is an overload of operator<< for a reference to an array of 5 chars, the array will "decay" to a pointer to char and treated as a C style string by the operator. C style strings are by convention terminated with a 0 char, which your array is lacking. Therefore the operator continues outputting the bytes in memory, interpreting them as printable chars. It just so happens that on the stack, the two arrays were adjacent so that the operator ran into asdf's memory area, outputting those chars and finally encountering the implicit 0 char which is at the end of "hello". If you omit the other declaration it's likely that your program will crash, namely if the next 0 byte comes later than the memory boundary of your program.
It is undefined behavior to access memory outside an object (here: test) through a pointer to that object.
Character sequences need a null terminator (\0).
char asdf[] = "hello"; // OK: String literals have '\0' appended at the end
char test[5] = {'h','e','l','l','o'}; // Oops, not null terminated. UB
Corrected:
char test[6] = {'h','e','l','l','o','\0'}; // OK
// ^ ^^^^
Related
I was playing around with c strings in c++ and found some behavior I don't understand when I don't terminate a char array.
char strA[2] = {'a','\0'};
char strB[1] = {'b'};
cout << strA << strB;
I would expect this to print ab, but instead it prints aba. If I instead declare strB before strA, it works as expected. Could someone explain what's going on here?
This is undefined behaviour and you simply are lucky that replacing the declaration of these 2 arrays works for you. Let's see what is happening in your code:
char strA[2] = {'a','\0'};
Creates an array that can be treated like a string - it is null terminated.
char strB[1] = {'b'};
Creates an array that cannot be treated like a string, because it lacks the null terminating character '\0'.
std::cout << strA << strB;
The first part, being << strA, works fine. It prints a since strA is treated as a const char*, which provided as an argument for std::ostream& operator << will be used to print every character untill the null terminating character is encountered.
What happens then? Then, the << strB is being executed (actually what happens here is a little different and more complicated than simply dividing this line into two, separate std::cout << calls, but it does not matter here). It is also treated as a const char*, which is expected to ended with mentioned '\0', however it is not...
What does that lead to? You are lucky enough that there randomly is only 1 character before (again - random) '\0' in memory, which stops the (possibly near-infinite) printing process.
Why, if I instead declare strB before strA, it works as expected?
That is because you were lucky enough that the compiler decided to declare your strA just after the strB, thus, when printing the strB, it prints everything that it consists + prints strA, which ends with null terminating character. This is not guaranteed. Avoid using char[] to represent and print strings. Use std::string instead, which takes care of the null terminating character for you.
When printing char arrays, the C (and C++) convention is to print all bytes until a '\0'.
Because of how the local variables are organized, strB's memory is behind strA's, so when printing strB the printing just 'overflows' and keeps printing strA until the terminating '\0'.
I guess when the deceleration is reversed, the printing of strB is terminated by a 0 that is just there because nothing else was set there, but you shouldn't rely on that - this is called a garbage value.
Don't use unterminated C-strings, at all. Also avoid C-strings in general, you can use C++ std::string which are much more secure and fun.
When I run this code on my computer, I have a bunch (exactly seven) of weird chars printed between the ab to the a, which are probably whatever was between strA's and strB's memory spaces.
When I reverse the declarations, I get ab$%^& where $%^& are a bunch of weird chars - the ones between the end of strB's memory to the next random \0.
Create a console application with the following code (renaming f to your entry point):
#include <iostream>
void f(){
char a[5][5];
std::cin>>a[0]>>a[1]>>a[2]>>a[3]>>a[4];
for (int y = 0; y<5; y++)std::cout<<a[y]<<'\n';
}
and input 5 lines of 5 characters such as :
abcde
abcde
abcde
abcde
abcde
I expected the output to be identical to the input or throw an error, but instead I got:
abcdeabcdeabcdeabcdeabcde
abcdeabcdeabcdeabcde
abcdeabcdeabcde
abcdeabcde
abcde
When investigated using the debugger, each a[y] value is equal to abcde and not the displayed output.
What on earth is going on here? Why is this happening, and is there a way to stop it?
Is it related to the
Stack around the variable 'a' was corrupted
Error that gets thrown after it std::couts?
I'm well aware of other ways to get the desired output using nested loops, but I'm wondering if there's a way to iterate only the outer dimension so it uses fewer characters - this is for a code golf challenge. It makes quite a difference:
for(int y=0;y<5;y++)std::cout<<a[y]<<'\n';
vs
for(int y=0;y<5;y++){for(int x=0;x<5;x++)std::cout<<a[y][x]}std::cout<<'\n';
The problem is caused by the fact that you are trying to store "abcde" in a char array with 5 elements. You need at least one more element in the array to hold the terminating null character.
As a consequence, your program has undefined behavior. We can try to make sense of the output but it's futile.
Use
char a[5][6]; // Anything greater than 5 will work for your input
If you don't want your code to be tied to a hard coded size, you can use std::string.
std::string a[5];
A C-string is an sequence of characters that ends with a null terminator. That means "abcde" is actually 6 characters long, the 5 you see plus the null terminator.
Since you only allocated enough space for the input without the null terminator trying to put the string into the array writes off the end of the array and is undefined behavior. What you need is
char a[5][6];
As that will have enough space for the 5 characters plus the null terminator.
When I run the example code, the wordLength is 7 (hence the output 7). But my char array gets some really weird characters in the end of it.
wordLength = word.length();
cout << wordLength;
char * wordchar = new char[wordLength]; //new char[7]; ??
for (int i = 0; i < word.length(); i++) //0-6 = 7
{
wordchar[i] = 'a';
}
cout << wordchar;
The output: 7 aaaaaaa²²²²¦¦¦¦¦ÂD╩2¦♀
Desired output is: aaaaaaa... What is the garbage behind it?? And how did it end up there?
You should add \0 at the end of wordchar.
char * wordchar = new char[wordLength +1];
//add chars as you have done
wordchar[wordLength] = `\0`
The reason is that C-strings are null terminated.
C strings are terminated with a '\0' character that marks their end (in contrast, C++ std::string just stores the length separately).
In copying the characters to wordchar you didn't terminate the string, thus, when operator<< outputs wordchar, it goes on until it finds the first \0 character that happens to be after the memory location pointed to by wordchar, and in the process it prints all the garbage values that happen to be in memory in between.
To fix the problem, you should:
make the allocated string 1 char longer;
add the \0 character at the end.
Still, in C++ you'll normally just want to use std::string.
Use: -
char * wordchar = new char[wordLength+1]; // 1 extra for null character
before for loop and
wordchar[i] ='\0'
after for loop , C strings are null terminated.
Without this it keeps on printing, till it finds the first null character,printing all the garbage values.
You avoid the trailing zero, that's the cause.
In C and C++ the way the whole eco-system treats string length is that it assumes a trailing zero ('\0' or simply 0 numerically). This is different then for example pascal strings, where the memory representation starts with the number which tells how many of the next characters comprise the particular string.
So if you have a certain string content what you want to store, you have to allocate one additional byte for the trailing zero. If you manipulate memory content, you'll always have to keep in mind the trailing zero and preserve it. Otherwise strstr and other string manipulation functions can mutate memory content when running off the track and keep on working on the following memory section. Without trailing zero strlen will also give a false result, it also counts until it encounters the first zero.
You are not the only one making this mistake, it often gets important roles in security vulnerabilities and their exploits. The exploit takes advantage of the side effect that function go off trail and manipulate other things then what was originally intended. This is a very important and dangerous part of C.
In C++ (as you tagged your question) you better use STL's std::string, and STL methods instead of C style manipulations.
I got this code from a textbook:
#include <iostream>
using namespace std;
int main(){
char str1[]="hello,world!", str2[20], *p1, *p2;
p1=str1; p2=str2;
/*
for(;*p1!='\0';p1++,p2++){
cout<<"p1="<<*p1<<endl;
*p2=*p1;cout<<"p2="<<*p2<<endl;
}
*p2='\0';
p1=str1; p2=str2;
*/
cout<<"p1="<<p1<<endl;
cout<< "p2="<<p2<<endl;
return 0;
}
I ran this code, it will output p1=hello,world!p2=
which I can understand.
But if I uncomment the for loop, the output shows here I got confused, why after the for loop, why it shows p1= instead of showing p1=hello,world!, and for pointer p2, even after the assignment in the for loop, it still shows p2=?
But after I uncomment p1=str1; p2=str2; this line, the output is p1=hello,world!, p2=hello,world!, why it works like that?
And what's the reason for writing this line *p2='\0';, it doesn't matter that this line is commented out or not, the previous outputs don't change.
can anyone tell me how the char pointer here is working?
The loop modifies p1 so that it points to the null terminator at the end of the string. That's the definition of an empty string. p2 likewise points to a null terminator at the end of a string.
If you reset p1 and p2 to their original values you can see the strings as they are.
The code is for copying str1 to str2.
In C++, '\0' is used to end a string. When you try to print a char pointer (say ptr), the compiler prints the string starting from *ptr (the character pointed to by the pointer). When the compiler finds '\0', it stops printing.
In the beginning, p1 points to the first char of str1 and p2 points to the first char of str2. If you print them without doing anything else, the compiler will print both the strings out completely. So the output will be p1=hello,world!p2=.
The for loop makes p1 and p2 advance through str1 and str2. At the end, p1 points to the \0 at the end of the str1 and p2 points to the '\0' at the end of str2. So if you print p1 or p2 directly after the for loop ends, the compiler will immediately find '\0' and stop printing. So, you get the output p1=p2=.
Uncommenting p1=str1; p2=str2; will make both strings point to the first characters again, so printing them now will cause the whole string to be printed. So you get the output p1=hello,world!p2=hello,world! (because str1 got copied to str2 in the for loop).
The *p2 = '\0' is just for ending str2 with '\0'. If your code works without that line, it means that the compiler initialized all the characters of str2 to '\0' automatically. However, the compiler isn't guaranteed to do that, so you should always terminate strings with '\0' in your programs.
Here is the output I see from VS2010 running that code with the commented parts uncommented:
p1=h
p2=h
p1=e
p2=e
p1=l
p2=l
p1=l
p2=l
p1=o
p2=o
p1=,
p2=,
p1=w
p2=w
p1=o
p2=o
p1=r
p2=r
p1=l
p2=l
p1=d
p2=d
p1=!
p2=!
p1=hello,world!
p2=hello,world!
That's pretty much what I would have expected! Basically this code is copying the contents of str1 into the (uninitialised) char array str2 via direct pointer manipulation, by copying each character from str1 into str2 one at a time.
To answer your last question, the reason for
*p2='\0';
is so that the second string that is being "created" by the for loop will be correctly null terminated. Without that line, it will just be a char array that cannot be treated like a 'C' string.
Overall this is a pretty contrived / non robust example though, as it won't work once we exceed 20 characters in length for the first string, due to str2[] being declared to be only 20 chars in size.
A c++ string is a char * under the hood, p1 and p2 are both pointing to the same string, as they are incremented they go through the characters of the string "*p2='\0';" sets the string to the null character it has no effect on the program because it is being reset anyway in the line after.
First, I'd like to say that I'm new to C / C++, I'm originally a PHP developer so I am bred to abuse variables any way I like 'em.
C is a strict country, compilers don't like me here very much, I am used to breaking the rules to get things done.
Anyway, this is my simple piece of code:
char IP[15] = "192.168.2.1";
char separator[2] = "||";
puts( separator );
Output:
||192.168.2.1
But if I change the definition of separator to:
char separator[3] = "||";
I get the desired output:
||
So why did I need to give the man extra space, so he doesn't sleep with the man before him?
That's because you get a not null-terminated string when separator length is forced to 2.
Always remember to allocate an extra character for the null terminator. For a string of length N you need N+1 characters.
Once you violate this requirement any code that expects null-terminated strings (puts() function included) will run into undefined behavior.
Your best bet is to not force any specific length:
char separator[] = "||";
will allocate an array of exactly the right size.
Strings in C are NUL-terminated. This means that a string of two characters requires three bytes (two for the characters and the third for the zero byte that denotes the end of the string).
In your example it is possible to omit the size of the array and the compiler will allocate the correct amount of storage:
char IP[] = "192.168.2.1";
char separator[] = "||";
Lastly, if you are coding in C++ rather than C, you're better off using std::string.
If you're using C++ anyway, I'd recommend using the std::string class instead of C strings - much easier and less error-prone IMHO, especially for people with a scripting language background.
There is a hidden nul character '\0' at the end of each string. You have to leave space for that.
If you do
char seperator[] = "||";
you will get a string of size 3, not size 2.
Because in C strings are nul terminated (their end is marked with a 0 byte). If you declare separator to be an array of two characters, and give them both non-zero values, then there is no terminator! Therefore when you puts the array pretty much anything could be tacked on the end (whatever happens to sit in memory past the end of the array - in this case, it appears that it's the IP array).
Edit: this following is incorrect. See comments below.
When you make the array length 3, the extra byte happens to have 0 in it, which terminates the string. However, you probably can't rely on that behavior - if the value is uninitialized it could really contain anything.
In C strings are ended with a special '\0' character, so your separator literal "||" is actually one character longer. puts function just prints every character until it encounters '\0' - in your case one after the IP string.
In C, strings include a (invisible) null byte at the end. You need to account for that null byte.
char ip[15] = "1.2.3.4";
in the code above, ip has enough space for 15 characters. 14 "regular characters" and the null byte. It's too short: should be char ip[16] = "1.2.3.4";
ip[0] == '1';
ip[1] == '.';
/* ... */
ip[6] == '4';
ip[7] == '\0';
Since no one pointed it out so far: If you declare your variable like this, the strings will be automagically null-terminated, and you don't have to mess around with the array sizes:
const char* IP = "192.168.2.1";
const char* seperator = "||";
Note however, that I assume you don't intend to change these strings.
But as already mentioned, the safe way in C++ would be using the std::string class.
A C "String" always ends in NULL, but you just do not give it to the string if you write
char separator[2] = "||". And puts expects this \0 at the ned in the first case it writes till it finds a \0 and here you can see where it is found at the end of the IP address. Interesting enoiugh you can even see how the local variables are layed out on the stack.
The line: char seperator[2] = "||"; should get you undefined behaviour since the length of that character array (which includes the null at the end) will be 3.
Also, what compiler have you compiled the above code with? I compiled with g++ and it flagged the above line as an error.
String in C\C++ are null terminated, i.e. have a hidden zero at the end.
So your separator string would be:
{'|', '|', '\0'} = "||"