char* s = "123";
std::cout << s[s[3]] << std::endl; // prints 1
std::cout << s[3] << std::endl; // prints nothing?
I tried running the following snippet and the first print statement outputs 1 while the second outputs (seemingly) nothing. What is going on when the pointer is dereferenced using the length of the char pointer array here?
It is unclear why you are using the value of the character at index 3 (s[3]) to index the string again. But in any case, the key point here is that you're using a char to index an array. This means that the char is used as a number, the conversion happening most likely using the ASCII character encoding.
The reason you're getting nothing printed out when you print s[3] is because s is a character array with length 4, and the last character is the null terminator. Null meaning the number 0. The null terminator identifies the end of the string. But it is not a printable character, because it is not meant to be printed. It doesn't have a gliph associated with it, so you don't get anything printed.
Of course, you can see now that s[s[3]] is nothing but s[0], which is the character "1".
Related
One of the signatures of std::basic_string::find method is:
size_type find( const CharT* s, size_type pos, size_type count ) const;
The parameters are the following:
pos - position at which to start the search
count - length of substring to search for
s - pointer to a character string to search for
The description of the behavior of the method for this overload is:
Finds the first substring equal to the range [s, s+count). This range may contain null characters.
I would like to know in what case it can be useful to have a range that contain null characters. For instance:
s.find("A", 0, 2);
Here, s corresponds to a string with a length of 1. Because count is 2, the range [s, s+count) contains a null character. What is the point?
There is a false premise that you didn't spell out, but combining the title and the question it is:
The null character indicates the end of a std::string.
This is wrong. std::strings can contain null characters at any position. One has to be cautious with functions that expect a null-terminated c-string, but find is so nice that it explicitly reminds you that it also works in the general case.
C-Strings are null terminated, hence this:
std::string x("My\0str0ing\0with\0null\0characters");
std::cout << x.size() << '\n';
Prints: 2, ie only characters before the \0 are used to constuct the std::string.
However, this
std::string s("Hello world");
s[5] = '\0';
std::cout << s << '\n';
Prints Helloworld (because \0 is not printable). Also char arrays can contain \0 at any postition. Usually this is interpreted as the terminating character of the string. However, as std::strings can contain null characters at any position, it is just consistent to provide also an overload that takes pointer to a character array that can contain null characters in the middle. An example for the usage of that overload is (s is the string from above)
std::string f;
f.push_back('\0');
f.push_back('w');
std::cout << s.find(f.c_str()) << '\n';
std::cout << s.find("") << '\n';
std::cout << s.find(f.c_str(),0,2) << '\n';
Output:
0
0
5
The overload without the count parameter assumes a null terminated c-string, hence s.find(f.c_str()) is the same as s.find(""). Only with the overload that has the count paramter the substring \0w is found at index 5.
Firstly, Sorry about my bad english.
I wanna ask something that I expect amazing. I'm not sure this is amazing for everyone, but It is for me :)
Let me give example code
char Text[9] = "Sandrine";
for(char *Ptr = Text; *Ptr != '\0'; ++Ptr)
cout << Ptr << endl;
This code prints
Sandrine
andrine
ndrine
drine
rine
ine
ne
e
I know it's a complicated issue in C++. Why İf I call Ptr to print out screen it prints all of array. However if Text array is a dynamic array, Ptr prints only first case of dynamic array(Text). Why do it happen? Please explain C++ array that how it goes for combination of pointing array.
thanks for helping.
There is nothing particular special about arrays here. Instead, the special behavior is for char const*: in C, pointers to a sequence of characters with a terminating null characters are used to represent strings. C++ inherited this notion of strings in the form of string literals. To support output of these strings, the output operator for char const* interprets a pointer to a char to be actually a pointer to the start of a string and prints the sequence up to the first null character.
When you write
char Text[9] = "Sandrine";
the "Text" is an address in memory, it is the starting address of your string and in its first location there is a 'S' followed by the rest of the characters. A string in C is delimited by a \0 i.e. "S a n d r i n e \0"
When you write
for(char *Ptr = Text; *Ptr != '\0'; ++Ptr)
cout << Ptr << endl;
when the for loop runs the first time it prints the whole string because Ptr points to the start of the string char* Ptr = Text when you increment Ptr
you are pointing to the next character Text + 1 i.e. 'a' and so on once Ptr finds \0 the for loop quits.
I have a fixed length character array I want to assign to a string. The problem comes if the character array is full, the assign fails. I thought of using the assign where you can supply n however that ignores \0s. For example:
std::string str;
char test1[4] = {'T', 'e', 's', 't'};
str.assign(test1); // BAD "Test2" (or some random extra characters)
str.assign(test1, 4); // GOOD "Test"
size_t len = strlen(test1); // BAD 5
char test2[4] = {'T', 'e', '\0', 't'};
str.assign(test2); // GOOD "Te"
str.assign(test2, 4); // BAD "Tet"
size_t len = strlen(test2); // GOOD 2
How can I assign a fixed length character array to a string correctly for both cases?
Use the "pair of iterators" form of assign.
str.assign(test1, std::find(test1, test1 + 4, '\0'));
Character buffers in C++ are either-or: either they are null terminated or they are not (and fixed-length). Mixing them in the way you do is thus not recommended. If you absolutely need this, there seems to be no alternative to manual copying until either the maximum length or a null terminator is reached.
for (char const* i = test1; i != test1 + length and *i != '\0'; ++i)
str += *i;
You want both NULL termination and fixed length? This is highly unusual and not recommended. You'll have to write your own function and push_back each individual character.
For the first case, when you do str.assign(test1) and str.assign(test2), you have to have /0 in your array, otherwise this is not a "char*" string and you can't assign it to std::string like this.
saw your serialization comment -- use std::vector<char>, std::array<char,4>, or just a 4 char array or container.
Your second 'bad' example - the one which prints out "Tet" - actually does work, but you have to be careful about how you check it:
str.assign(test2, 4); // BAD "Tet"
cout << "\"" << str << "\"" << endl;
does copy exactly four characters. If you run it through octal dump(od) on Linux say, using my.exe | od -c you'd get:
0000000 " T e \0 t " \n
0000007
Now I have learned that cin.getline works like this.
cin.getline(dest string, number of charecters to put into string);
so assume this program.
char s1[8]="Hellopo";
cin.getline(s1,5);
cout<<s1<<endl;
This was by input :hhhhhhhhhhhhh
This was the programs output: hhhh
I have 2 concerns in this program.
1-) I set the program to get 5 characters from what the user inputed and store in s1. When I ran The program it only printed out 4 characters.
2-) Also I expected the program to continue printing out the rest of s1 after it printed out what it got from the user. but it stopped after hhhh
Please explain to me my two concerns. Thank you
std::cin::getline will store four characters plus a null termination in this case (five characters in total). And std::cout will stop printing at the first null terminator it finds.
From istream::getline():
count-1 characters have been extracted (in which case setstate(failbit) is executed).
This means that if you specify 5, only 4 characters will be read. And:
...it then stores a null character CharT() into the next successive location of the array
so a null character will be inserted after the fourth character. So the array s will have contents:
'h' == s[0]
'h' == s[1]
'h' == s[2]
'h' == s[3]
0 == s[4]
The operator<< will stop printing a char* when the first null character is found.
The fifth character is the 0-terminator. getline(buffer,n) stores up to n bytes including the 0-terminator in the buffer. And then cout << s1; stops at the 0-terminator.
The fifth character is the null terminator, which marks the end of the string.
I am a student learning C++, and I am trying to understand how null-terminated character arrays work. Suppose I define a char array like so:
char* str1 = "hello world";
As expected, strlen(str1) is equal to 11, and it is null-terminated.
Where does C++ put the null terminator, if all 11 elements of the above char array are filled with the characters "hello world"? Is it actually allocating an array of length 12 instead of 11, with the 12th character being '\0'? CPlusPlus.com seems to suggest that one of the 11 would need to be '\0', unless it is indeed allocating 12.
Suppose I do the following:
// Create a new char array
char* str2 = (char*) malloc( strlen(str1) );
// Copy the first one to the second one
strncpy( str2, str1, strlen(str1) );
// Output the second one
cout << "Str2: " << str2 << endl;
This outputs Str2: hello worldatcomY╗°g♠↕, which I assume is C++ reading the memory at the location pointed to by the pointer char* str2 until it encounters what it interprets to be a null character.
However, if I then do this:
// Null-terminate the second one
str2[strlen(str1)] = '\0';
// Output the second one again
cout << "Terminated Str2: " << str2 << endl;
It outputs Terminated Str2: hello world as expected.
But doesn't writing to str2[11] imply that we are writing outside of the allocated memory space of str2, since str2[11] is the 12th byte, but we only allocated 11 bytes?
Running this code does not seem to cause any compiler warnings or run-time errors. Is this safe to do in practice? Would it be better to use malloc( strlen(str1) + 1 ) instead of malloc( strlen(str1) )?
In the case of a string literal the compiler is actually reserving an extra char element for the \0 element.
// Create a new char array
char* str2 = (char*) malloc( strlen(str1) );
This is a common mistake new C programmers make. When allocating the storage for a char* you need to allocate the number of characters + 1 more to store the \0. Not allocating the extra storage here means this line is also illegal
// Null-terminate the second one
str2[strlen(str1)] = '\0';
Here you're actually writing past the end of the memory you allocated. When allocating X elements the last legal byte you can access is the memory address offset by X - 1. Writing to the X element causes undefined behavior. It will often work but is a ticking time bomb.
The proper way to write this is as follows
size_t size = strlen(str1) + sizeof(char);
char* str2 = (char*) malloc(size);
strncpy( str2, str1, size);
// Output the second one
cout << "Str2: " << str2 << endl;
In this example the str2[size - 1] = '\0' isn't actually needed. The strncpy function will fill all extra spaces with the null terminator. Here there are only size - 1 elements in str1 so the final element in the array is unneeded and will be filled with \0
Is it actually allocating an array of length 12 instead of 11, with the 12th character being '\0'?
Yes.
But doesn't writing to str2[11] imply that we are writing outside of the allocated memory space of str2, since str2[11] is the 12th byte, but we only allocated 11 bytes?
Yes.
Would it be better to use malloc( strlen(str1) + 1 ) instead of malloc( strlen(str1) )?
Yes, because the second form is not long enough to copy the string into.
Running this code does not seem to cause any compiler warnings or run-time errors.
Detecting this in all but the simplest cases is a very difficult problem. So the compiler authors simply don't bother.
This sort of complexity is exactly why you should be using std::string rather than raw C-style strings if you are writing C++. It's as simple as this:
std::string str1 = "hello world";
std::string str2 = str1;
The literal "hello world" is a char array that looks like:
{ 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0' }
So, yes, the literal is 12 chars in size.
Also, malloc( strlen(str1) ) is allocating memory for 1 less byte than is needed, since strlen returns the length of the string, not including the NUL terminator. Writing to str[strlen(str1)] is writing 1 byte past the amount of memory that you've allocated.
Your compiler won't tell you that, but if you run your program through valgrind or a similar program available on your system it'll tell you if you're accessing memory you shouldn't be.
I think you are confused by the return value of strlen. It returns the length of the string, and it should not be confused with the size of the array that holds the string. Consider this example :
char* str = "Hello\0 world";
I added a null character in the middle of the string, which is perfectly valid. Here the array will have a length of 13 (12 characters + the final null character), but strlen(str) will return 5, because there are 5 characters before the first null character. strlen just counts the characters until a null character is found.
So if I use your code :
char* str1 = "Hello\0 world";
char* str2 = (char*) malloc(strlen(str1)); // strlen(str1) will return 5
strncpy(str2, str1, strlen(str1));
cout << "Str2: " << str2 << endl;
The str2 array will have a length of 5, and won't be terminated by a null character (because strlen doesn't count it). Is this what you expected?
For a standard C string the length of the array that is storing the string is always one character longer then the length of the string in characters. So your "hello world" string has a string length of 11 but requires a backing array with 12 entries.
The reason for this is simply the way those string are read. The functions handling those strings basically read the characters of the string one by one until they find the termination character '\0' and stop at this point. If this character is missing those functions just keep reading the memory until they either hit a protected memory area that causes the host operating system to kill your application or until they find the termination character.
Also if you initialize a character array with the length 11 and write the string "hello world" into it will yield massive problems. Because the array is expected to hold at least 12 characters. That means the byte that follows the array in the memory is overwritten. Resulting in unpredictable side effects.
Also while you are working with C++, you might want to look into std:string. This class is accessible if you are using C++ and provides better handling of strings. It might be worth looking into that.
I think what you need to know is that char arrays starts from 0 and goes until array length-1 and on position array length has the terminator('\0').
In your case:
str1[0] == 'h';
str1[10] == 'd';
str1[11] == '\0';
This is why is correct str2[strlen(str1)] = '\0';
The problem with the output after the strncpy is because it copys 11 elements(0..10) so you need to put manually the terminator(str2[11] = '\0').