C++ int and char array adress [duplicate] - c++

This question already has answers here:
Why is address of char data not displayed?
(8 answers)
print address of array of char
(3 answers)
C++ pointer assigning char array's address
(2 answers)
How do I get the address of elements in a char array?
(2 answers)
Closed 2 years ago.
I just started learning C++. I have confusion while printing out the int array it is printing the 0th element address of the array. And (array + 1) is the address after 4 registers since int is 4byte. But for char array, the behavior is not the same. Is char array implemented differently in C++?
code:
#include <iostream>
using namespace std;
int main()
{
char char_array [5] {'a', 'e', 'i', 'o', 'u'};
int int_array [5] {1,2,3,4,5};
cout << sizeof(char) << endl;
cout << sizeof(int) << endl;
cout << char_array << endl;
cout << char_array+1 << endl;
cout << int_array << endl;
cout << int_array + 1 << endl;
}
output:
1
4
aeiou≡r
eiou≡r
0x61fe00
0x61fe04
Process returned 0 (0x0) execution time : 0.428 s
Press any key to continue.

For starters the program has undefined behavior because the declared character array
char char_array [5] {'a', 'e', 'i', 'o', 'u'};
does not contain a string but using the overloaded operator << in these statements
cout << char_array << endl;
cout << char_array+1 << endl
for a pointer to char requires that the pointer would point to a string.
You could at least declare the array like
char char_array [6] {'a', 'e', 'i', 'o', 'u', '\0' };
Using an integer array as an expression in the operator << results that the overloaded resolution selects the operator for the type void * and the operator outputs the address of the first element of the integer array.
In these statements
cout << char_array+1 << endl;
cout << int_array + 1 << endl;
there is used the pointer arithmetic. The expression char_array+1 or int_array + 1 increases the value or the pointer (the array designator in such an expression is implicitly converted to pointer to its first element) by the value equal to sizeof( char ) or sizeof( int ) correspondingly.
sizeof( char ) is always equal to 1. sizeof( int ) depends on the used system and usually at least for 32-bit systems is equal to 4. And this output
0x61fe00
0x61fe04
demonstrates this.
If you want to output addresses for elements of the character array then you should write for example
cout << static_cast<void *>( char_array ) << endl;
cout << static_cast<void *>( char_array+1 ) << endl;

char[5] gets decayed to char const *, and then there is an overload for operator<< that writes out the string represented by the char const *.
int[5] decays to int const *, but there is not a similar overload for int const *, and so it simply prints the address.

std::cout is a variable of type std::basic_ostream which has an overload especially for formatting char*. Which is important because raw arrays will decay to pointers when passed by value.
The unique overload of operator<< for char* expects that the incoming data is a C String. Which is a string of characters that is nul terminated.
char char_array [5] {'a', 'e', 'i', 'o', 'u'};
Creates an array which is not nul terminated which is why you get seemingly random characters (via a buffer over read, leading to undefined behaviour). To rectify that you would just need to include a nul terminator. When char_array is declared as :
char char_array [6] {'a', 'e', 'i', 'o', 'u','\0'};
then
std::cout << char_array << std::endl;
will output aeiou and
std::cout << char_array + 1 << std::endl;
will output eiou

Here int_array is a pointer to an array of int. When you print int_array, you are printing the address of the start of the array (0x61fe00 in you case). When you print int_array + 1 your are doing what is called pointer arithmetic: int_array + 1 could be roughly translated as give me the next address holding and int and as int is 4 bytes, you can see that the next address is effectively the address plus four.
char_array is also a pointer to an array of char. Passing a pointer to an array of char while effectively print the underlying string. Here your string misses the end character `\0' which is UB.
You should definitively read on what is a pointer and a value because it is clear here that you are missing that knowledge.

A char array is a string and is being output as one.

Related

Why don't char arrays with separate chars end with a null-terminator unlike string literals?

I was playing around with char arrays in c++ and wrote this program:
int main()
{
char text[] = { 'h', 'e', 'l', 'l', 'o' }; //arrays initialised like this
//will have a size of the number
//of elements that you see
char text2[] = "hello"; //arrays initialised like this will have a size of
//the number of elements that you see + 1 (0 on the
//end to show where the end is
cout << endl;
cout << "The size of the first array is: " << sizeof(text) << endl;
cout << endl;
for (int i = 0; i < sizeof(text); i++)
{
cout << i << ":" << text[i] << endl;
}
cout << endl;
cout << "The size of the first array is: " << sizeof(text2) << endl;
cout << endl;
for (int i = 0; i < sizeof(text2); i++)
{
cout << i << ":" << text2[i] << endl;
}
cout << endl;
cin.get();
return 0;
}
This program gives me the output:
The size of the first array is: 5
0:h
1:e
2:l
3:l
4:o
The size of the first array is: 6
0:h
1:e
2:l
3:l
4:o
5:
My question is: Is there a particular reason that initializing a char array with separate chars will not have a null terminator (0) on the end unlike initializing a char array with a string literal?
A curly braces initializer just provides the specified values for an array (or if the array is larger, the rest of the items are defaulted). It's not a string even if the items are char values. char is just the smallest integer type.
A string literal denotes a zero-terminated sequence of values.
That's all.
Informally, it's the second quotation character in a string literal of the form "foo" that adds the NUL-terminator.
In C++, "foo" is a const char[4] type, which decays to a const char* in certain situations.
It's just how the language works, that's all. And it's very useful since it dovetales nicely with all the standard library functions that model a string as a pointer to the first element in a NUL-terminated array of chars.
Splicing in an extra element with something like char text[] = { 'h', 'e', 'l', 'l', 'o' }; would be really annoying and it could introduce inconsistency into the language. Would you do the same thing for signed char, and unsigned char, for example? And what about int8_t?
A string literal like for example this "hello" has a type of a constant character array and initializwd the following way
const char string_literal_hello[] = { 'h', 'e', 'l', 'l', 'o', '\0' };
As it is seen the type of the string literal is const char[6]. It contains six characters.
Thus this declaration
char text2[] = "hello";
that can be also written like
char text2[] = { "hello" };
in fact is substituted for the following declaration
char text2[] = { 'h', 'e', 'l', 'l', 'o', '\0' };
That is then a string literal is used as an initializer of a character array all its characters are used to initialize the array.
You can terminate it yourself in multiple ways:
char text1[6] = { 'h', 'e', 'l', 'l', 'o' };
char text2[sizeof "hello"] = { 'h', 'e', 'l', 'l', 'o' };
char text3[] = "hello"; // <--- my personal favourite
Is there a particular reason that initializing a char array with separate chars will not have a null terminator (0)
The reason is because that syntax...
Type name[] = { comma separated list };
...is used for initializing arrays of any type. Not just char.
The "quoted string" syntax is shorthand for a very specific type of array that assumes a null terminator is desired.
When you designate a double quote delimited set of adjacent characters (a string literal), it is assumed that what you want is a string. And a string in C means an array of characters that is null-terminated, because that's what the functions that operate on strings (printf, strcpy, etc...) expect. So the compiler automatically adds that null terminator for you.
When you provide a brace delimited, comma separated list of single quote delimited characters, it is assumed that you don't want a string, but you want an array of the exact characters you specified. So no null terminator is added.
C++ inherits this behavior.

Understanding strings as pointers in C++

I have troubles understanding strings as pointers. Apparently a string is understood as a pointer which points to the first address of the string. So using the "&"-operator I should receive the address of the first character of the string. Here's a small example:
#include "stdafx.h"
#include <iostream>
using namespace std;
int main(){
char text[101];
int length;
cout << "Enter a word: ";
cin >> text;
length = strlen(text);
for (int i = 0; i <= length; i++) {
cout << " " << &text[i];
}
return 0;
}
When entering a word such as "Hello", the output is: "Hello ello llo lo o". Instead I expected to receive the address of each character of "Hello". When I use the cast long(&text[i]) it works out fine. But I don't understand why. Without the cast, apparently the "&"-operator gives the starting address of the string to be printed. Using a cast it gives the address of every character separately.
Maybe sb. can explain this to me - I'd be really grateful!
&text[i] is equivalent to text + i and that shifts the pointer along the char[] array by i places using pointer arithmetic. The effect is to start the cout on the (i)th character, with the overload of << to a const char* called. That outputs all characters from the starting point up to the NUL-terminator.
text[i] however is a char type, and the overload of << to a char is called. That outputs a single character.
In C++, if you want a string, then use std::string instead. You can still write cin >> text; if text is a std::string type! Your code is also then not vulnerable to overrunning your char buffer.
If you have a character array storing a string as for example
char text[] = "Hello";
then the array is initialized like
char text[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
In this statement
std::cout << text;
there is used operator << overloaded for the type const char * and the array text is converted implicitly to pointer to its first element.
You could write instead
std::cout << &text[0];
because in the both statements the expressions text and &text[0] have type char *.
The operator overloaded for the type const char * outputs characters starting from the address at the pointer until a zero character is encountered.
So if instead of the statement
std::cout << &text[0];
you'll write
std::cout << &text[1];
then the only thing that is changed is the starting address of the string and nothing more. That is in fact you are outputting string that is represented like
{ 'e', 'l', 'l', 'o', '\0' }
If to write
std::cout << &text[2];
that is if the pointer in the right side of the expression is moved one position right then it means that you'll deal with the string
{ 'l', 'l', 'o', '\0' }
and so on.
That is the operator << overloaded like
template<class traits>
basic_ostream<char, traits>& operator<<(basic_ostream<char, traits>&,
const char*);
just outputs entities pointed to by the second parameter as strings.
If you want to output the value of the pointer itself instead of the string pointed to by the pointer you should use another overloaded operator << declared like
basic_ostream<charT, traits>& operator<<(const void* p);
Tp invoke it you should write for example
std::cout << ( void * )text;
or
std::cout << ( void * )&text[i];
where i is some index.
Instead of the C casting you can use C++ casting like
std::cout << static_cast<void *>( &text[i] );
To print the address of an array element, you could do:
cout << " " << (void*)&text[i];
This:
cout << " " << &text[i];
is equivalent to this:
cout << " " << text + i;
which means that you ask to print the string, starting from index i.

I am stuck with string and array

I have come across a c++ code:
char greeting[6] = {'H', 'e', 'l', 'l', 'o', '\0'};
cout << "Greeting message: ";
cout << greeting << endl;
output: Hello
Since greeting is an array of size 6, displaying greeting should display only "H" because of greeting[0] in cout, since it is displaying first address of array. I don't know where I am wrong.
Except when it is the operand of the sizeof or unary & operators, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array.
This means that in the statement
cout << greeting << endl;
the expression greeting is converted from an expression of type char [6] to an expression of type char *, and the value of the expression is the address of the first element.
The stream operator << is defined such that if it receives an argument of type char *, it will write out the sequence of characters starting at that address until it sees the 0 terminator; here's a simplistic example of how it might work:
std::ostream& operator<<( std::ostream& s, char *p )
{
while (*p)
s.put( *p++ );
return s;
}
The real operator definition will be a bit more complex, but that's the basic idea.
If you want to print out just the first character of greeting, you must explicitly index or dereference it:
cout << greeting[0] << endl;
or
cout << *greeting << endl;
greeting decays to a pointer itself. But if it sounds complicated, you can look at it as a pointer to understand what cout does.
cout method prints characters starting from greeting[0] until \0.
If you want to see the address value write
char greeting[6] = {'H', 'e', 'l', 'l', 'o', '\0'};
cout << "Greeting message: ";
cout << (void*)greeting << endl;
// ^^^^^^^
The std::ostream& std::operator<<(std::ostream&, T type); provides specializations for the const char*, const char[] and char types.
yes greeting here is a pointer to first element in the array so if you do something like that :
cout << *greeting;
output will be H.
but in reality when you pass it to cout object it's smart enough to know that greeting is not just a pointer but it's more it's a string according to the internal implementation of operator overloading in handling .
so it's not a problem ,it's just cout can understand it.

why the results are so different?

I don't know why the similar code has a great difference? the first code outputs normally, but the second code outputs some unrecognizable characters. Who can explain it for me?
Thks
#include <iostream>
using namespace std;
int main(){
char a[5] = { 'A', 'B', 'C', 'D' };
cout << a + 1 << endl;
char b[5] = {'a','b','c','d','e'};
cout << b+1 << endl;
return 0;
}
Both expressions a+1 and b+1 degrade into a char* which is then treated by << as a NUL-terminated string, but only a is NUL-terminated. Accessing b as a NUL-terminated string causes undefined behavior, which in your case seems to be printing garbage after the first few characters. (Note that I originally said both were not NUL-terminated, but then I noticed that you had only 4 characters in the initializer for a but specified a size of 5. That means the 5th element would be zero-initialized, effectively NUL-terminating a.)
If you want to print them correctly without causing undefined behavior, make sure they are NUL-terminated:
int main(){
char a[5] = { 'A', 'B', 'C', 'D' }; // Works as-is, but not good form
cout << a + 1 << endl;
char b[6] = {'a','b','c','d','e', '\0'}; // Needed NUL-terminated, but still not the best way
cout << b+1 << endl;
return 0;
}
Or as eigenchris noted in a comment, you could rely on the compiler to NUL-terminate it for you by using a string constant instead:
char a[] = "ABCD";
char b[] = "abcde"; // Probably the best way to do this.
Since you send char * arguments to cout, like a or b, c style strings are expected. This means also the zero termination character is expected for each string. So the following will work:
char a[5] = { 'A', 'B', 'C', 'D', '\0' };
cout << a + 1 << endl;
char b[6] = {'a', 'b', 'c', 'd', 'e', '\0'};
cout << b + 1 << endl;
The thing that this happens to b is that you overwrite zero character by defining all 5 characters. 0 should have been the sixth character.
When you cout << a+1, the output will be "BCD", because the character after 'D' is a a nul (\0).
This is because you specified the array-size of 5, but only gave it 4 values. The remaining, unspecified value will be set to 0.
When you cout << b+1, you'll get "bcdeXXXX", which will continue until a nul is found.
(the XXX will be unpredictable characters based on whatever is in memory.)
This is because you specified the value of all 5 characters, and the memory beyond that is undefined. It might be nuls, it might be random values left over from an earlier program. There is no way to know for sure. But cout will continue printing until it encounters a nul \0, or causes a segmentation/access violation by reading an inaccessible memory address.
That is why you get random garbage on the second output.

Does the null character get automatically inserted?

Is the C++ compiler supposed to automatically insert the null character after the end of the char array? The following prints "Word". So does the C++ compiler automatically insert a null character at the last position in the array?
int main()
{
char x[] = {'W', 'o', 'r', 'd'};
cout << x << endl;
return 0;
}
No, it doesn't. Your array x has only 4 elements. You are passing std::cout a pointer to char, which is not the beginning of a null-terminated string. This is undefined behaviour.
As an example, on my platform, similar code printed Word?, with a spurious question mark.
In C/C++ string literals are automatically appended with terminating zero. So if you will write for example
char x[] = "Word";
then the size of the array will be equal to 5 and the last element of the array will contain '\0'.
When the following record is used as you showed
char x[] = {'W', 'o', 'r', 'd'};
then the compiler allocates exactly the same number of elements as the number of the initializers. That is in this case the array will have only 4 elements.
However if you would write the following way
char x[5] = {'W', 'o', 'r', 'd'};
then the fifth element will contain the terminating zero because it has no corresponding initializer and will be zero-initialized.
Also take into account that the following record is valid in C but invalid in C++
char x[4] = "Word";
That is undefined behavior, and it printed "Word" as it might also have crashed.
No it doesn't. You yourself have to do that if you want to use it as a string or you want to use library functions for strings for that.
No, in this case you are just creating a non-null terminated array. In-case of a read based on null termination. Your program won't stop as it wouldn't find the null at last.
char x[] = "Word";
In this case 5 bytes would be allocated for x. Null at the end.
No, it doesn't. If you want it inserted automatically, use a string constant:
const char *x = "Word";
//OR
std::string const s = "Word";
const char *x = s.c_str();
//OR
char x[] = { "Word" }; //using the C++11 brace initializer syntax.
The simplest way to verify this is to look at the memory address for x using a debugger and monitor it and the next few bytes following it. Stop the debugger after the initialization and examine the memory contents.
No, if you want to use the char array as a string use this form of the initializer
char x[] = { "Word" };
Although it is not guaranteed, it seems my implementation always adds a \0 even if I am not requested.
I tested noNull[2] is always \0
char noNull[] = {'H', 'i'};
for (int i = 2; i < 5; ++i) {
if (noNull[i] == '\0') {
std::cout << "index " << i << " is null" << std::endl;
} else {
std::cout << "index " << i << " is not null" << std::endl;
}
}
output
index 2 is null
index 3 is not null
index 4 is not null