Normally in C++, character arrays are initialized in the following way,
char example[5]="cat";
What if you initialize it with "" (just a double quotes without spaces)?
What will be the elements in the character array after initialization?
The declaration
char temp[3] = "";
is same as
char temp[3] = {0};
// `\0` ascii value is 0
remember remaining elements of half initialized array initialized with 0.
Point :char temp[3] = "" is easy to type(means writing), so its preferable.
Look even compare it with this declaration char temp[3] = {'\0'}; (it need more chars to type) Whereas in char temp[3] = ""; is simple (even no type mismatch - int/char).
It's a 3-character array initialized to three null characters.
EDIT (after comments below):
From K&R:
If there are fewer initializers for an array than the number specified, the missing elements will be zero for external, static, and automatic variables.
...
Character arrays are a special case of initialization; a string may be used instead of the braces and commas notation:
char pattern[] = "ould";
is a shorthand for the longer but equivalent
char pattern[] = { 'o', 'u', 'l', 'd', '\0' };
From a draft copy of the C++ standard, section 8.5.2, Character arrays:
"1. A char array (whether plain char, signed char, or unsigned char), char16_t array, char32_t array, or wchar_t array can be initialized by a narrow character literal, char16_t string literal, char32_t string literal, or wide string literal, respectively, or by an appropriately-typed string literal enclosed in braces. Successive characters of the value of the string literal initialize the elements of the array. [Example:
char msg[] = "Syntax error on line %s\n";
shows a character array whose members are initialized with a string-literal. Note that because ’\n’ is a single character and because a trailing ’\0’ is appended, sizeof(msg) is 25. — end example ]
...
"3. If there are fewer initializers than there are array elements, each element not explicitly initialized shall be zero-initialized (8.5)."
A blank string. The first char will be the null terminator. After an experiment, it appears the remaining characters are set to 0 (which is also null terminator).
You are setting the first element to be a null termination character. The other elements gain partial initialisation to zero: C and C++ : Partial initialization of automatic structure
Related
char s[10] = "Test";
How are the remaining chars (after "Test" and terminating null) initialized? (Is it defined?)
Background
I'm doing this to write a custom fixed-width (and ignored) header into an STL file. But I wouldn't like to have random/uninitialized bytes in the remaining space.
The general rule for any array (or struct) where not all members are initialized explicitly, is that the remaining ones are initialized "as if they had static storage duration". Which means that they are set to zero.
So it will actually work just fine to write something weird like this: char s[10] = {'T','e','s','t'};. Since the remaining bytes are set to zero and the first of them will be treated as the null terminator.
How are the remaining chars (after "Test" and terminating null) initialized? (Is it defined?)
Yes, it's well defined, in a char array initialized with a string literal and with specified size larger than the length of the string literal all the remaining elements are zero-initialized.
From C++ standard (tip-of-trunk) Character arrays § 9.4.3 [dcl.init.string]
3. If there are fewer initializers than there are array elements, each element not explicitly initialized shall be zero-initialized ([dcl.init]).
Some examples from cppreference:
char a[] = "abc";
// equivalent to char a[4] = {'a', 'b', 'c', '\0'};
// unsigned char b[3] = "abc"; // Error: initializer string too long
unsigned char b[5]{"abc"};
// equivalent to unsigned char b[5] = {'a', 'b', 'c', '\0', '\0'};
wchar_t c[] = {L"кошка"}; // optional braces
// equivalent to wchar_t c[6] = {L'к', L'о', L'ш', L'к', L'а', L'\0'};
Is '\0' set automatically if I provide an extra element for it, but left it in the initialization string?
Like:
char a[6] = {"Hello"}; // <- Is NUL set here automatically?
I´ve did one experiment with C and C++:`
C:
#include <stdio.h>
int main()
{
char NEWYEAR[16] = {"Happy New Year!"};
printf("%s\n",NEWYEAR);
return 0;
}
Output:
Happy New Year!
C++:
#include <iostream>
int main()
{
char NEWYEAR[16] = {"Happy New Year!"};
std::cout << NEWYEAR << std::endl;
return 0;
}
Output:
Happy New Year!
The compilers did not threw an error or warning and the result is as desired. So it might seem to work correctly. But is that really true?
Is everything correct by doing so?
Is this maybe bad programming style?
Does this cause any issues?
It is more complex than that
char a[6] = "Hello";
will initialize the array of characters to Hello\0, because Hello has an implicit terminating zero.
char a[6] = "Hello\0";
would be valid in C, but invalid in C++ because the literal is 7 characters long, having both an implicit terminator and explicit embedded null character. C allows the literal to drop the implicit terminator. C11 6.7.9p14:
An array of character type may be initialized by a character string literal or UTF-8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
char a[5] = "Hello";
would be valid C, resulting in a char array that does not contain a zero-terminated string. It is invalid in C++.
(emphasis mine). It means that the implicit terminating null is optionally added, if there is room in the array, but it does not need to.
And
char a[4] = "Hello";
in C would bring the literal Hell, because while it is a constraint violation in C (C11 6.7.9p2),
No initializer shall attempt to provide a value for an object not contained within the entity being initialized.
attempting to initialize more elements than there are items in a list usually just generates a warning in many compilers and is then often ignored by programmers. The paragraph 14 does not have an exception for anything other besides the implicit terminator.
And lastly
char a[7] = "Hello";
in both C and C++ would result in a character array of 7 elements containing the characters Hello\0\0, because in an array having an initializer, the elements not explicitly initialized by the initializer will be default-initialized as if initialized by literal 0. In this case the first 6 elements will be initialized explicitly and the 7th implicitly.
Given the possibility of silently truncating the terminator in C, it is better to just omit the array size and write
char a[] = "Hello";
This will declare a as array of 6 elements, just like char a[6] = "Hello";, but you cannot mistype the array size.
If there's space for the null-terminator then it will be added.
In C (but not C++) if the size of the array is the length of the string except the null-terminator, then the null-terminator will not be added. So e.g.
char a[5] = "Hello";
is valid, but there won't be a null-terminator in the array.
It's not valid to provide a smaller size than the string length.
If i define a string:
char array[5] = {"hello"};
Is the NUL character (\0) byte "hidden" added to "array[5]", so that the array is not contained of 5 bytes in memory, it is contained of 6 bytes?
OR does the NUL character byte lie "separated" from "array[5]" in memory only after the last element of the char-array, but not directly assigned to "array[5]"?
If i would put this:
i = strlen(array);
printf("The Amount of bytes preserved for array: %d",i);
What would be the result for the amount of bytes preserved for array[5]?
Does the "NUL" character ("\0") byte lie separated after the last element of char-array in the memory or is it assigned to that char-array?
Does the "NUL" character ("\0") byte lie separated after the last element of char-array in the memory or is it assigned to that char-array?
No. Neither answer is correct. See below for details.
Answer for C:
If you write your code like that, with an explicit size that is too small for the terminator, array will have exactly 5 elements and there will be no NUL character.
strlen(array) has undefined behavior because array is not a string (it has no terminator). char array[5] = {"hello"}; is equivalent to char array[5] = {'h', 'e', 'l', 'l', 'o'};.
On the other hand, if you write
char array[] = "hello";
it is equivalent to
char array[6] = {'h', 'e', 'l', 'l', 'o', '\0'};
and strlen will report 5.
The relevant part of the C standard is:
An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.
(Emphasis mine.)
Answer for C++:
Your code is invalid. [dcl.init.string] states:
There shall not be more initializers than there are array elements. [ Example:
char cv[4] = "asdf"; // error
is ill-formed since there is no space for the implied trailing '\0'. — end example ]
In C++, char array[5] = {"hello"} is of six bytes. But you have assigned five bytes only. Therefore, the array declaration is incorrect.
Alternatively, this works: char array[6] = {"hello"}.
Is the string "Hello\n" equal to
{'H','e','l','l','o','\','n','\0'} or
{'H','e','l','l','o','\n','\0'}?
Does adding escape sequences in string definitions like:
char arr[] = "Hello\n";
Create strings like:
char arr[] = {'H','e','l','l','o','\','n','\0'};
or strings like:
char arr[] = {'H','e','l','l','o','\n','\0'};
also is the NULL character '\0' added every time when a declaration like char* foo = "HelloWorld!" is used.
Your second interpretation is correct, since the escape sequence you are talking about (newline), is only 1 character in length. The NULL character is added to to the end every time you make such a declaration.
The reason your first interpretation is incorrect, is because \ is the escape character, meaning it would escape the quote right after it. You can even see this in Stack Overflow's syntax highlighting!
char arr[] = {'H','e','l','l','o','\','n',\0'};
// See how the n is not highlighted --^
As evident, the n is outside the quotes and is interpreted as a keyword or an identifier.
"Hello\n" means {'H','e','l','l','o','\n','\0'}. It's \n is the newline character.
char* foo = "HelloWorld!"
is assigning the decayed pointer to the literal char array to the char* foo. And yes that string literal is null terminated char array.
Note that char* foo = ... and char foo[]=".." are two different things. The second one initializes the char array foo with the content of the string literal. First one is simply pointing to the immutable literal string(foo is pointing to that literal string).
From standard 6.7.9
An array of character type may be initialized by a character string literal or UTF-8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
If you compiled your code with char arr[] = {'H','e','l','l','o','\','n',\0'}; you will likely to see the message
error: stray '\' in program
From standard 5.2.1 again:
In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or by escape sequences consisting of the backslash \ followed by one or more characters. A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string.
I'm a bit baffled that this is allowed:
char num[6] = "a";
What is happening here? Am I assigning a pointer to the array or copying the literal values into the array (and therefore I'm able to modify them later)?
Why can I assign a string literal less than the array itself? What is happening here?
This is well defined. When initialize character arrays with string literal,
If the size of the array is specified and it is larger than the number
of characters in the string literal, the remaining characters are
zero-initialized.
So,
char num[6] = "a";
// equivalent to char num[6] = {'a', '\0', '\0', '\0', '\0', '\0'};
Am I assigning a pointer to the array or copying the literal values into the array (and therefore I'm able to modify them later)?
The value will be copied, i.e. the elements of the array will be initialized by the chars of the string literal (including '\0').
String literals can be used to initialize character arrays. If an array is initialized like char str[] = "foo";, str will contain a copy of the string "foo".
Successive characters of the string literal (which includes the implicit terminating null character) initialize the elements of the array.
char num[6] = "a";
is equivalent to
char num[6] = {'a', '\0', '\0', '\0', '\0', '\0'};
Why can I assign a string literal less than the array itself?
This is allowed by the language. It is often useful to be able to add more characters to the array later, which wouldn't be possible if the existing characters filled the entire array.
Am I assigning a pointer to the array
No. You cannot assign a pointer to an array, so that is not happening.
or copying the literal values into the array
That is exactly what is happening.
and therefore I'm able to modify them later
You are able to modify the array, indeed.
Just use char num[6] = {"a"};. It works.
This kind of declaration is a special syntax sugar thing. It's equivalent to
char num[6] = {'a', 0}
The array is always modifiable. Its contents after such a declaration would be a character representing 'a', a zero (NUL terminator) and the remainder of the array will also be zeroed (zero initialization).
That is one type of declaration whcih is equivalent to
char num[6] = {'a','\0'};
You declared c-string with length of max. 5 normal chars, at the end must me \0 to end c - string.
With declaration you can use
char num[6] = "a";
then you need to assign value:
With strcpy(dest,src)
strcpy(num,"test");
Char by char
num[0]='t';
num[1]='e';
num[2]='s';
num[3]='t';
num[4]='\0';