Pointers & Arrays Pointing Array issue - c++

Firstly, Sorry about my bad english.
I wanna ask something that I expect amazing. I'm not sure this is amazing for everyone, but It is for me :)
Let me give example code
char Text[9] = "Sandrine";
for(char *Ptr = Text; *Ptr != '\0'; ++Ptr)
cout << Ptr << endl;
This code prints
Sandrine
andrine
ndrine
drine
rine
ine
ne
e
I know it's a complicated issue in C++. Why İf I call Ptr to print out screen it prints all of array. However if Text array is a dynamic array, Ptr prints only first case of dynamic array(Text). Why do it happen? Please explain C++ array that how it goes for combination of pointing array.
thanks for helping.

There is nothing particular special about arrays here. Instead, the special behavior is for char const*: in C, pointers to a sequence of characters with a terminating null characters are used to represent strings. C++ inherited this notion of strings in the form of string literals. To support output of these strings, the output operator for char const* interprets a pointer to a char to be actually a pointer to the start of a string and prints the sequence up to the first null character.

When you write
char Text[9] = "Sandrine";
the "Text" is an address in memory, it is the starting address of your string and in its first location there is a 'S' followed by the rest of the characters. A string in C is delimited by a \0 i.e. "S a n d r i n e \0"
When you write
for(char *Ptr = Text; *Ptr != '\0'; ++Ptr)
cout << Ptr << endl;
when the for loop runs the first time it prints the whole string because Ptr points to the start of the string char* Ptr = Text when you increment Ptr
you are pointing to the next character Text + 1 i.e. 'a' and so on once Ptr finds \0 the for loop quits.

Related

Why does s[s[3]] = '1' when char *s = "123"?

char* s = "123";
std::cout << s[s[3]] << std::endl; // prints 1
std::cout << s[3] << std::endl; // prints nothing?
I tried running the following snippet and the first print statement outputs 1 while the second outputs (seemingly) nothing. What is going on when the pointer is dereferenced using the length of the char pointer array here?
It is unclear why you are using the value of the character at index 3 (s[3]) to index the string again. But in any case, the key point here is that you're using a char to index an array. This means that the char is used as a number, the conversion happening most likely using the ASCII character encoding.
The reason you're getting nothing printed out when you print s[3] is because s is a character array with length 4, and the last character is the null terminator. Null meaning the number 0. The null terminator identifies the end of the string. But it is not a printable character, because it is not meant to be printed. It doesn't have a gliph associated with it, so you don't get anything printed.
Of course, you can see now that s[s[3]] is nothing but s[0], which is the character "1".

Palindrome mystery: Why an array of size 3 ends up being printed with 5 elements?

#include <iostream>
#include <cstring>
using namespace std;
int main(){
char a[] = "abc";
char b[2];
for(int i = 0,k = 2;i < 3;i++,k--){
b[k] = a[i];
cout << i << " " << k << endl;
}
if(strcmp(a,b) == 0){
cout << "palindrome";
}else{
cout << "no palindrome" << endl;
}
cout << "a: " << a << endl;
cout << "b: " << b << endl;
return 0;
}
output:
0 2
1 1
2 0
no palindrom
a: abc
b: cbabc
I don't understand why b array ends up with 5 elements, when the array holds only 3. Additionally, the loop loops only 3 times and this is the output I get.... A mystery.
You have an out-of-bounds array access and also need to be conscious of null-terminating your strings!
Specifically, char b[2]; gives you an array with exactly 2 chars, so only b[0] and b[1] are valid. You also need to account for the null character that should terminate all C-style strings. So to hold "cba" for example you need 4 elements. You can also see this if you print sizeof(a) (should be 4: 'a', 'b', 'c', '\0').
Basically, your program elicits undefined behavior (UB). The simple fix is to make b bigger (the same size as a, which is 4 in this case). The more complete answer is to manage your array lengths more carefully and look at the safer "n" versions of the C manipulation functions such as strncmp
Edit: to be complete, you have 2 sourced of UB. The first is in line b[k] = a[i] when k == 2 because again you have only allocated b[0] and b[1]. The second is when you call strcmp since b has not been properly null-terminated and strcmp will happily read past the array bounds, which it doesn't know.
b is not terminated by a null character (\0), so any string operation on it (like strcmp, or even just printing it with cout runs over until it happens to hit such a character somewhere in the memory. In other words, you are witnessing undefined behavior.
Strictly speaking you have undefined behaviour and any observed behaviour (wrong or seemingly partially correct) is explained by that.
For details and solutions see the other answers.
End of answer.
Now lets look at a speculation on why you might in your environment end up with specifically the output you observe.
Assumption, the memory for your arrays
char a[] = "abc";
char b[2]
looks like an often seen habit of linkers of how to arrange variables:
b[0] non-initialised
b[1] non-initialised
a[0] = 'a'
a[1] = 'b'
a[2] = 'c'
a[3] = '\0'
Note the four (not three) elements of a and the terminator 0.
Your loop, right in the first iteration, attempts to write to the non-existing b[2].
This is already what causes undefined behaviour. Clean discussion ends here.
Let's continue speculating.
Your loop unintentionally writes one place beyond the existing b[1] and ends up clobbering a[0]. By chance it writes the value which happens to be already there, so no change there.
Your loop continues to write, now to existing entries of b.
The speculated result is
b[0] = 'c
b[1] = 'b'
a[0] = 'a' = 'a'
a[1] = 'b'
a[2] = 'c'
a[3] = '\0'
and the loop ends.
Then you try to output a and b.
This is done by outputting all characters found consecutively from the start of the arrays, until a terminator 0 is found.
For a this (luckily in case of the "a") is "abc\0", all from a.
For b this is "bc" from b, followed (on the search for a 0) by "abc\0" from a.
Note that the seemingly correct "a" already is incorrectly from a, not from b.
Ok, when debugging this you can check for address of b[2].
In gdb:
(gdb) p &b[1]
$8 = 0x7fffffffdfe3 "\377abc"
See? If b was null terminated it would start with '\0', but it doesn't, you tell the compiler to use 2 spaces for b. When asked the debugger what's the address of last b character b[1], it not only tells the address, it also shows the char* value represented. As b is a non null terminated (my compiler didn't initialize it), it will continue beyond the boundaries of b!. Suspiciously enough the string of characters finishes with 'a''b''c''\0'. Let's check address of a[0]:
(gdb) p &a[0]
$9 = 0x7fffffffdfe4 "abc"
See? The a field pointed by b is contiguous to a. Now you are making two mistakes here:
You are not properly initializing b.
b reserves 2 slots of memory. If you want to check palindromes of a fixed size of 3 characters you should reserve 4 slots like you did for the null terminated string "abc".
Try changing b declaration from:
char b[2];
To:
char b[] = "xyz";
Your initialization code will set the palindrome as a function of a, so it would do what you intend to.

Don't understand why while(*ptr++) enter while loop for 0 value (string terminator)

I have a problem with the following apperently trivial sample code:
(on visual studio 2015)
Please ignore the part with pointing to a literal constant, possible warnings or erros on the newwer compiler, that is not what I don't understand.
My problem is why it prints '0' and how the while loop works, tried using both debugger and printf. My understanding of the problem is this:
moves ptr to point at 'e'
checks content of ptr which is 'e' it is not 0 so it enters while loop
back to condition line, moves ptr to 'l'
checks *ptr, it is not 0, enters...
blah blah for the letters l, o
Then it increases ptr after 'o' and gets '\0', at which point by my logic it should NOT enter the loop, but i does, and no longer enters after one more step when it is pointing over the terminator at junk?!?
I looked over 2 other topics, this topic about operator precedence and this one about the while(*ptr) case going over the terminator, but I don't understand from the second WHY it enters the loop and thy it increases the pointer value afterwards? for what i understand the order is first increase pointer, then get the value with *
#include <cstdio>
#include<stdlib.h>
int main(void) {
char* str = "hello";
char* ptr = str;
char least = 127;
while (*ptr++) {
printf("%c-", *ptr);
least = ((*ptr) < (least)) ? (*ptr) : (least);
}
printf("%d\n", least);
}
Inside the loop you're not using the same character that you tested in the while condition.
Since ptr++ is a post-increment, it returns the current value of ptr and then increments it. So when ptr points to the o character, and you do
while (*ptr++)
it tests 'o', which is not zero, so it will enter the loop. But after the test it increments ptr to point to the next character, so now it points to '\0'. Then it prints this character and sets least to it.
You should increment the pointer after processing it. You can do this by moving ptr++ to the end of the loop body. Or you can use a for loop instead of while:
for (ptr = str; *ptr; ptr++)
The last printf call outputs the terminating zero of the string literal as an integer.
To make it clear consider a string literal with one actual character as for example "A".
Before the first iteration of the while loop
while (*ptr++)
{
printf("%c-", *ptr);
least = ((*ptr) < (least)) ? (*ptr) : (least);
}
the pointer ptr points to the character 'A' of the string literal. This expression
*ptr++
can be imagined like
char c = *ptr;
++ptr;
So the control passes to the body of the loop because 'A' is not equal to zero. Meantime the pointer ptr was increased after evaluation the condition. So now it points to the terminating zero '\0' of the string literal.
As 0 is less than 123 then the variable least gets the value 0.
least = ((*ptr) < (least)) ? (*ptr) : (least);
In the next iteration of the loop ptr still points to the terminating zero. So the control bypasses the body of the loop and the zero is outputted in the next statement after the loop.
From the C Standard (6.5.2.4 Postfix increment and decrement operators)
2 The result of the postfix ++ operator is the value of the
operand. As a side effect, the value of the operand object is
incremented (that is, the value 1 of the appropriate type is added to
it).

c++ dynamic allocation initial values

I'm trying to concatenate two strings into a new one (finalString) like this:
finalString = string1 + '&' + string2
Firstly, I allocate the memory for finalString, then i use strcat().
finalString = new char[strlen(string1 ) + strlen(string2) + 2];
cout << finalString << endl;
finalString = strcat(finalString , string1 );
finalString = strcat(finalString , "&");
finalString = strcat(finalString , string2);
cout << finalString << endl;
I'll suppose that string1 is "Mixt" and string2 is "Supermarket".
The output looks like this:
═════════════════řřřř //(which has 21 characters)
═════════════════řřřřMixt&Supermarket
I know that if I use round brackets in "new char" the string will be initialized to 0 and I'll get the desired result, but my question is why does the first output has 21 characters, supposing that I allocated only 17. And even so, why does the final string length exceed the initial allocation size (21 > 17) ?
Thanks in advance!
Two words for you "buffer overrun"
The reason you have 21 characters initially is because there is a '/0' (also called null) character 22 characters away from the memory address that finalString points to. This may or may not be consistent based on what is in your memory.
As for the reason why you have a longer than what you wanted again you wrote outside the initial buffer into random memory. You did not crash because you did not write over something important.
strcat will take the memory address given, find the first '/0' it finds and from that place on it will copy the data from the second memory pointer you provide until the first '/0' it finds there.
What you are doing is VERY DANGEROUS, if you do not hit a /0' before you hit something vital you will cause a crash or at minimum bad behavior.
Undersand in C/C++ a char[] is just a pointer to the initial memory location of the first element. THERE ARE NO SAFEGUARDS! You alone must be careful with that..
if you set the first character of the finalString[0] = 0 then you the logic will work better.
As a different answer, why not use std::string:
std::string a, b, c;
a = "part1";
b = "part2";
c = a + " & " + b;
std::cout << c << '\n';
part1 & part2
Live example: http://ideone.com/pjqz9T
It will make your life easier! You should always look to use stl types with c++.
If you really do need a char * then at the end you can do c.c_str().
Your string is not initialized which leads to undefined behavior. In strcat, string will be appended when it finds the null character.
So, as others already mentioned, either you can do
finalString[0] = 0;
or in place of your first strcat use strcpy. This will copy the first string and put a null character at the end.
why 21 characters?
This is due to undefined behavior. It will keep on printing until it won't find a null or else it will crash as soon as it tries to access any illegal memory.

C++ char array null terminator location

I am a student learning C++, and I am trying to understand how null-terminated character arrays work. Suppose I define a char array like so:
char* str1 = "hello world";
As expected, strlen(str1) is equal to 11, and it is null-terminated.
Where does C++ put the null terminator, if all 11 elements of the above char array are filled with the characters "hello world"? Is it actually allocating an array of length 12 instead of 11, with the 12th character being '\0'? CPlusPlus.com seems to suggest that one of the 11 would need to be '\0', unless it is indeed allocating 12.
Suppose I do the following:
// Create a new char array
char* str2 = (char*) malloc( strlen(str1) );
// Copy the first one to the second one
strncpy( str2, str1, strlen(str1) );
// Output the second one
cout << "Str2: " << str2 << endl;
This outputs Str2: hello worldatcomY╗°g♠↕, which I assume is C++ reading the memory at the location pointed to by the pointer char* str2 until it encounters what it interprets to be a null character.
However, if I then do this:
// Null-terminate the second one
str2[strlen(str1)] = '\0';
// Output the second one again
cout << "Terminated Str2: " << str2 << endl;
It outputs Terminated Str2: hello world as expected.
But doesn't writing to str2[11] imply that we are writing outside of the allocated memory space of str2, since str2[11] is the 12th byte, but we only allocated 11 bytes?
Running this code does not seem to cause any compiler warnings or run-time errors. Is this safe to do in practice? Would it be better to use malloc( strlen(str1) + 1 ) instead of malloc( strlen(str1) )?
In the case of a string literal the compiler is actually reserving an extra char element for the \0 element.
// Create a new char array
char* str2 = (char*) malloc( strlen(str1) );
This is a common mistake new C programmers make. When allocating the storage for a char* you need to allocate the number of characters + 1 more to store the \0. Not allocating the extra storage here means this line is also illegal
// Null-terminate the second one
str2[strlen(str1)] = '\0';
Here you're actually writing past the end of the memory you allocated. When allocating X elements the last legal byte you can access is the memory address offset by X - 1. Writing to the X element causes undefined behavior. It will often work but is a ticking time bomb.
The proper way to write this is as follows
size_t size = strlen(str1) + sizeof(char);
char* str2 = (char*) malloc(size);
strncpy( str2, str1, size);
// Output the second one
cout << "Str2: " << str2 << endl;
In this example the str2[size - 1] = '\0' isn't actually needed. The strncpy function will fill all extra spaces with the null terminator. Here there are only size - 1 elements in str1 so the final element in the array is unneeded and will be filled with \0
Is it actually allocating an array of length 12 instead of 11, with the 12th character being '\0'?
Yes.
But doesn't writing to str2[11] imply that we are writing outside of the allocated memory space of str2, since str2[11] is the 12th byte, but we only allocated 11 bytes?
Yes.
Would it be better to use malloc( strlen(str1) + 1 ) instead of malloc( strlen(str1) )?
Yes, because the second form is not long enough to copy the string into.
Running this code does not seem to cause any compiler warnings or run-time errors.
Detecting this in all but the simplest cases is a very difficult problem. So the compiler authors simply don't bother.
This sort of complexity is exactly why you should be using std::string rather than raw C-style strings if you are writing C++. It's as simple as this:
std::string str1 = "hello world";
std::string str2 = str1;
The literal "hello world" is a char array that looks like:
{ 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0' }
So, yes, the literal is 12 chars in size.
Also, malloc( strlen(str1) ) is allocating memory for 1 less byte than is needed, since strlen returns the length of the string, not including the NUL terminator. Writing to str[strlen(str1)] is writing 1 byte past the amount of memory that you've allocated.
Your compiler won't tell you that, but if you run your program through valgrind or a similar program available on your system it'll tell you if you're accessing memory you shouldn't be.
I think you are confused by the return value of strlen. It returns the length of the string, and it should not be confused with the size of the array that holds the string. Consider this example :
char* str = "Hello\0 world";
I added a null character in the middle of the string, which is perfectly valid. Here the array will have a length of 13 (12 characters + the final null character), but strlen(str) will return 5, because there are 5 characters before the first null character. strlen just counts the characters until a null character is found.
So if I use your code :
char* str1 = "Hello\0 world";
char* str2 = (char*) malloc(strlen(str1)); // strlen(str1) will return 5
strncpy(str2, str1, strlen(str1));
cout << "Str2: " << str2 << endl;
The str2 array will have a length of 5, and won't be terminated by a null character (because strlen doesn't count it). Is this what you expected?
For a standard C string the length of the array that is storing the string is always one character longer then the length of the string in characters. So your "hello world" string has a string length of 11 but requires a backing array with 12 entries.
The reason for this is simply the way those string are read. The functions handling those strings basically read the characters of the string one by one until they find the termination character '\0' and stop at this point. If this character is missing those functions just keep reading the memory until they either hit a protected memory area that causes the host operating system to kill your application or until they find the termination character.
Also if you initialize a character array with the length 11 and write the string "hello world" into it will yield massive problems. Because the array is expected to hold at least 12 characters. That means the byte that follows the array in the memory is overwritten. Resulting in unpredictable side effects.
Also while you are working with C++, you might want to look into std:string. This class is accessible if you are using C++ and provides better handling of strings. It might be worth looking into that.
I think what you need to know is that char arrays starts from 0 and goes until array length-1 and on position array length has the terminator('\0').
In your case:
str1[0] == 'h';
str1[10] == 'd';
str1[11] == '\0';
This is why is correct str2[strlen(str1)] = '\0';
The problem with the output after the strncpy is because it copys 11 elements(0..10) so you need to put manually the terminator(str2[11] = '\0').