Understanding char reference - c++

I've written this simple script to understand what a reference is, and I'm getting stuck on the char array.
int numbers[5] = {3, 6, 9, 12, 15};
for (int i = 0; i < 5; i++)
{
cout << numbers[i] << endl;
cout << &numbers[i] << endl;
}
cout << "--------------" << endl;
char letters[5] = {'a', 'b', 'c', 'd', 'e'};
for (int i = 0; i < 5; i++)
{
cout << letters[i] << endl;
cout << &letters[i] << endl;
}
and this is the output:
3
0xbffff958
6
0xbffff95c
9
0xbffff960
12
0xbffff964
15
0xbffff968
--------------
a
abcde
b
bcde
c
cde
d
de
e
With the int array, when I use &numbers[i], I receive a strange number that is a memory location. This is ok; it's exactly what I've understood.
But with char, I don't understand why I have this output.

The reason is that cout "knows" what to do with a char * value - it prints the character string as a NUL-terminated C string.
The same is not true of an int * value, so cout prints the pointer value instead.
You can force pointer value output by casting:
cout << static_cast<void *>(&letters[i]) << endl;

You are looking at a peculiarity of C++ streams. It tries to convert its arguments to something that is usually printable. The type of this expression is &ints[x] int*. &chars[x] becomes char* which is, incidentally also the type of a C character string. As we want this cout << "FOO"' to print out the whole string, it is needed to have this behavior. In your case this actually results in undefined behavior as the string you are using is not properly null-terminated. To resolve this issue use a static_cast.

When you pass to ostream::operator<< (in fact it is a global function, not an operator) the argument of type char*, it is considered as a null-terminated string.

Related

Pointer printing first char of string

#include <iostream>
using namespace std;
char* input(char **S);
int main()
{
char **S = new char *;
char *K = input(S);
//cout << K << endl;
cout << *K << endl;
}
char* input(char **S)
{
cout << "Enter string: ";
cin >> *S;
cout << S << endl; // Prints address of S
cout << *S << endl; //Prints content of address stored in S
return *S;
}
I am failing to understand why when I print out *K, I just get the first character of the input string but if I print out the commented line(just K alone) I get the whole string. Any help with explaining what I am not able to see or understand is appreciated.
Let's understand how arrays work:
// Let's say I have one character array
char arr[] = {'a', 'b', 'c', 'd'};
In here, the name of the array i.e. arr acts as the pointer to the first element of the array. However, do note that it is NOT the pointer to the first element to avoid confusion, it just have an implicit conversion to pointer of element type. More details can be found here: https://stackoverflow.com/a/1641963/10821123
Now since array is contiguous, the rest of the elements can be determined.
// so ideally, the below two statements would print the same thing
cout << &arr << endl;
cout << (void*) &arr[0] << endl;
// the above line just takes out the address of the first pointer
Now coming to your question, I'll convert my example to a string one:
char *K = "abc";
cout << *K << endl; // a
cout << K << endl; // abc
Note that the above assignment of char *K = "abc"; will give you a warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
The pointer only holds the address of the first element of the array, so when you dereference the pointer, it prints the first element, i.e. *K is interpreted as K[0]
Now there's an overload of operator <<, so what it does is if it sees a character pointer i.e. char*, it prints the complete null-terminated string, that's why in your case too, it is printing the whole string.

Why does printing the 'address of index n' of c style strings lead to output of substring

I'm rather new to C++ and while working with a pointer to a char array (C style string) I was confused by its behavior with the ostream object.
const char* items {"sox"};
cout << items << endl;
cout << items[0] << endl;
cout << *items << endl;
cout << &items << endl;
cout << &items[1] << endl;
Running this leads to:
sox
s
s
0x7fff2e832870
ox
In contrary to pointer of other data types, printing the variable doesn't output the address, but the string as a whole. By what I understand, this is due to the << operator being overloaded for char arrays to treat them as strings.
What I don't understand is, that cout << &items[1] prints the string from index 1 onward (ox), instead of the address of the char at index 1. Is this also due to << operator being overloaded or what is the reason for this behavior?
The type of &items[1] is const char *. Therefore the const char * overload of operator << is used, which prints the string from index 1 onwards.
OTOH, the type of &items is const char **, for which no specific overload exists, so the address of items is printed (via the const void * overload).
Back in the olden days, when C ran the world, there was no std::string, and programmers had to make do with arrays of char to manage text. When C++ brought enlightenment (and std::string), old habits persevered, and arrays of char are still used to manage text. Because of this heritage, you'll find many places where arrays of char act differently from arrays of any other type.
So,
const int integers[] = { 1, 2, 3, 4 };
std::cout << integers << '\n';
prints the address of the first element in the array.
But,
const char text[] = { 'a', 'b', 'c', '\0' };
std::cout << text << '\n';
prints the text in the array text, up to the final 0: abc
Similarly, if you try to print addresses inside the array, you get different behavior:
std::cout << &integers[1] << '\n';
prints the address of the second element in th array, but
std::cout << &text[1] << '\n';
prints the text starting at the second character of the array: bc
And, as you suspected, that's because operator<< has an overload that takes const char* and copies text beginning at the location pointed to by the pointer, and continuing up to the first 0 that it sees. That's how C strings work, and that behavior carries over into C++.
items[1] is the second character of the array and its address, i.e. &items[1], is a pointer to the second character (with index 1) as well. So, with the same rule that you have mentioned for operator <<, the second character of the string till the end is printed.

Why does using only name of character array prints the whole array [duplicate]

As a beginner of learning C++, I am trying to understand the difference between an array of type char and an array of type int. Here is my code:
void IntArray () {
int array[5] = {5,6,7,8,9};
cout << "Print int array: " << array << endl;
cout << "Print int array[0]: " << array[0] << endl;
cout << "Print int array[0]+1: " << array[0]+1 << endl;
}
void CharArray () {
char array[5] = {'a', 'b', 'c', 'd', '\0'};
cout << "Print char array: " << array << endl;
cout << "Print char array[0]: " << array[0] << endl;
cout << "Print char array[0]+1: " << array[0]+1 << endl;
}
And here is the output:
Print int array: 0xbfd66a88
Print int array[0]: 5
Print int array[0]+1: 6
Print char array: abcd
Print char array[0]: a
Print char array[0]+1: 98
My questions are:
Why does the following output the string '0xbfd66a88'? I was expecting it to return the address of the first element in the array:
cout << "Print char array: " << array << endl;
Why does the following output '98'? I was expecting it to output the letter 'b':
cout << "Print char array[0]+1: " << array[0]+1 << endl;
1.
Because char arrays are treated differently to other arrays when you stream them to cout - the << operator is overloaded for const char*. This is for compatibility with C, so that null-terminated char arrays are treated as strings.
See this question.
2.
This is due to integral promotion. When you call the binary + with a char (with value 'a') and an int (with value 1), the compiler promotes your char to either a signed int or an unsigned int. Which one is implementation specific - it depends on whether char is signed or unsigned by default, and which int can take the full range of char. So, the + operator is called with the values '97' and '1', and it returns the value '98'. To print that as a char, you need to first cast it:
cout << "Print char array[0]+1: " << static_cast<char>(array[0]+1) << endl;
See this question.
Okay let's go over each separately.
Print int array: 0xbfd66a88
Here you print an int[] which goes into the operator << overload that takes int*. And when you print a pointer you see a memory address in hexadecimal format.
Print int array[0]: 5
Here you print the first element of the array which is an int. As expected.
Print int array[0]+1: 6
Here you add 1 to the first element of the array and print the result, which is still an int. 5+1 becomes 6. No mystery here.
Print char array: abcd
This is a bit trickier. You print a char[] and there is a special overload for operator << that takes a const char* and that one gets called. What this overload does is print each character beginning from the address where the pointer points until it finds a terminating zero.
Print char array[0]: a
Here you print a char so the overload that takes char gets called. It prints the corresponding ASCII character, which is 'a'.
Print char array[0]+1: 98
Here the result of operator+ is an int because the literal 1 is an int and the char value gets promoted to the wider type (int). The result is 98 because the ASCII code of the letter 'a' is 97. When you print this int you just see the number.

Program picking '\0' even when it is not mentioned - Clarification

So, I am given to predict what this program will do:
int main()
{
char d[] = {'h','e','l','l','o'};
const char *c = d;
std::cout << *c << std::endl;
while ( *c ) {
c = c + 1;
std::cout << *c << std::endl;
if ( *c == '\0' )
std::cout << "Yes" << std::endl;
}
return 0;
}
From my understanding the code should've never printed Yes as there is no \0 in the character array d[], so is it the garbage value this program is picking? I short this while should run infinite times. Is that right?
The proper answer to this question is that the program exhibits undefined behavior, because it goes past the end of the array.
Changing the program to use string literal for initialization would change the behavior to "always prints "Yes":
char d[] = "hello";
I short this while should run infinite times.
Once undefined behavior happens, all bets are off. However, commonly the program manages to find a zero byte in memory outside of d[], at which point it prints "Yes", and exits the loop.
Your code is an example where array d is not a string (more accurately, not a nul-termitated string), so it is incorrect usage of that array as a string. That means, all functions that work with char* strings and use \0 as a sign of string end go ouside the memory allocated for d.... and somtimes \0 can be found outside (no one knows beforehand where this \0 will be found). And once again, this is incorrect usage that can lead to errors related to array boundaries violation.
Finaly, because conditions for if statement and while is "associated" in sense "(*c == '\0') is true at the last iteration of loop while(*c){...}" and there is very low probability that while(*c){...} is infinite, "yes" will be printed eventually.
UPDATE:
Let's consider additionally the following example:
#include <iostream>
using namespace std;
int main()
{
char d1[] = { 'h', 'e', 'l', 'l', 'o' }; // no nul-terminator here
char d2[] = { 'h', 'e', 'l', 'l', 'o', '\0' };
char d3[] = "hello";
cout << "Memory allocated for d1 - " << sizeof(d1) << endl;
cout << "Length of string in d1 - " << strlen(d1) << endl;
cout << "Memory allocated for d2 - " << sizeof(d2) << endl;
cout << "Length of string in d2 - " << strlen(d2) << endl;
cout << "Memory allocated for d3 - " << sizeof(d3) << endl;
cout << "Length of string in d3 - " << strlen(d3) << endl;
return 0;
}
Output will be (for the second line not always exactly, but similar):
Memory allocated for d1 - 5
Length of string in d1 - 19
Memory allocated for d2 - 6
Length of string in d2 - 5
Memory allocated for d3 - 6
Length of string in d3 - 5
Here you can see 3 ways of char-array initialization. And d3 here is initialized with string literal where \0 is added because value is in "". Array d1 has no nul-terminator and as a result strlen return value greated than sizeof - \0 was found outside array d1.

String after appending Char changning its size

I want to test what if string append char's size, and below is the outcome.
I know that the string end with the null character, but why the outcome is like that?
#include <iostream>
#include <string>
using namespace std;
int main(){
string a = "" + 'a'; //3
string b = "" + '1'; //2
string c = "a" + 'a'; //2
string d = "1" + '1'; //3
string e = "\0" + 'a'; //20
string f = "\0" + '1'; //1
string g = "a" + '\0'; //1
string h = "1" + '\0'; //1
string i = "" + '\0'; //0
string j = "" + '\0'; //0
cout << a.size() << endl;
cout << b.size() << endl;
cout << c.size() << endl;
cout << d.size() << endl;
cout << e.size() << endl;
cout << f.size() << endl;
cout << g.size() << endl;
cout << h.size() << endl;
cout << i.size() << endl;
cout << j.size() << endl;
return 0;
}
Your code is not doing what you think.
String literals decay to const char *, and char is an integer type. If you try to sum them, the compiler finds that the simplest way to make sense of that stuff is to convert chars to ints, so the result is performing pointer arithmetic over the string literals - e.g. ""+'a' goes to the 97th character in memory after the beginning of the string literal "" (if 'a' is represented by 97 on your platform).
This results in garbage being passed to the string constructor, which will store inside the string being constructed whatever it finds at these locations of memory until it founds a \0 terminator. Hence the "strange" results you get (which aren't reproducible, since the exact memory layout of the string table depends from the compiler).
Of course all this is undefined behavior as far as the standard is concerned (you are accessing char arrays outside their bounds, apart from the cases where you add \0).
To make your code do what you mean, at least one of the operands must be of type string:
string c = string("a") + 'a';
or
string c = "a" + string("a");
so the compiler will see the relevant overloads of operator+ that involve std::string.
Most of your initializers have undefined behaviour. Consider, for example:
string a = "" + 'a';
You are adding a char to a char pointer. This advances the pointer by the ASCII value of the char, and uses the resulting (undefined) C string to initialize a.
To fix, change the above to:
string a = string("") + 'a';