Difference in the initialisation of character array - c++

I do the following for character array intialisation :
char a[] = "teststring";
char b[]={'a','a','b','b','a'};
While for the first, if I need to get the string length, I must do strlen(a) ....for the other string I should do sizeof(b)/sizeof(b[0]).
why this difference?
EDIT : (I got this)
char name[10]="StudyTonight"; //valid character array initialization
char name[10]={'L','e','s','s','o','n','s','\0'}; //valid initialization
Remember that when you initialize a character array by listings all its characters separately then you must supply the '\0' character explicitly.
I get that with char b we have to add '\0' to make a proper initialisation.
ANOTHER :
Therefore, the array of char elements called myword can be initialized with a null-terminated sequence of characters by either one of these two statements:
char myword[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
char myword[] = "Hello";

A string literal, like "teststring" contains the characters between the double-quotes, plus a terminating char with value zero. So
char a[] = "ab";
has the same effect as;
char a[] = {'a', 'b', '\0'};
strlen() searches for that character with value '\0'. So strlen(a) in this case will return 2.
Conversely, sizeof() gets the actual size of the memory used. Since sizeof(char) is 1, by definition in the standard, this means sizeof(a) give the value of 3 - it counts the 'a', the 'b', and the '\0'.

a is a C-style string, i.e, null-terminated char array. The initialization is equivalent to:
char a[] = {'t','e','s','t','s','t','r','i','n','g','\0'};
b, however, is not null-terminated, so it's not a C-style string, you can't use functions like std::strlen() because they are only valid for C-style strings.

String literals are expanded into char arrays, but also include the terminating zero char. So think the
char a[] = "teststring";
as if you have types this
char a[] = {'t','e','s','t','s','t','r','i','n','g','\0'};
A rule of thumb
Whenever you will use strlen() on a char array, use string literals its for initialization. The strlen function can be thought as a simple scan the terminating zero char (\0) counting the iterations needed.
A word about sizeof
Even if sometimes used with parentheses, sizeof is an operator, an integral part of the C++ language (inherited from C times). In cases like char c[] = "hello";, sizeof(c) will return 6, which is exactly 1 more than strlen(c), and you might be thinking: "lets skip that inefficient scanning for the terminator", but sizeof stops to be such "efficient" as soon as it works on pointers, and arrays can (and will) be used as pointers whenever required. Look at the following example:
#include <iostream>
// naive approach, don't do that
int myarraysize(char s[])
{
return sizeof(s);
}
int main ()
{
char c[] = "hello";
std::cout << sizeof(c) << " vs " << myarraysize(c) << std::endl;
return 0;
}
online demo

You can always write
char b[]={'a','a','b','b','a','\0'};
to overcome "the difference".
Also note
sizeof(b)/sizeof(b[0])
essentially boils down to
sizeof(b)
since sizeof(char) is always 1. Your formula is used for any other array element types.

Related

What is difference of char l[] {'try'} and char l[] {'t', 'r', 'y'} in c++?

When I was trying to cout them
char l[] {'t', 'r', 'y'};
std::cout << l << std::endl;
I got try printed in the terminal.
However, when I tried this.
char l[] {'try'};
I only got y.
I came to c++ from python so lots of things don't make sense to me, so can you explain the difference of these two expressions?
None of your examples are valid.
The first one has undefined behavior.
char l[] {'t', 'r', 'y'};
std::cout << l << std::endl;
Here you define l to be a char[3]. That's fine. Printing it like you do is however not since the operator<< overload you use requires a null terminated string. Your char[3] is not null terminated so the operator<< overload will try to find the null terminator out of bounds which causes undefined behavior. A working variant would have been
char l[] {'t', 'r', 'y', '\0'};
which would have made it a char[4] with a \0 (null terminator) at the end.
The second example tries to create the array from a multibyte character. Single quotes are for individual characters so that's why that interpretation doesn't work. Many compilers will simply refuse to compile it. To create the array from a string literal, use double quotes:
char l[] {"try"};
or the idiomatic way:
char l[] = "try";
Both versions will create a null terminated array, just like char l[] {'t', 'r', 'y', '\0'};
char l[] {'t', 'r', 'y'}; defines l to be array of three characters with no terminating null character, so using it in std::cout << l is bad because the insert operator << needs the terminating null character to tell it where the end is.
char l[] {'try'}; defines l to be an array of one character because 'try' is a single number formed from three characters in an implementation-defined way. Since it is only one number, the size of the array l is set to be one element. To initialize l with that number, it is converted to char, which loses information about the three characters.
char l[] {"try"}; would define l to be an array of four characters, because string literals (marked by " instead of ') automatically include a terminating null character. Also, even though a string literal is “one thing,” there is a special rule that says when a string literal is used to initialize an array, its contents are used to initialize the array. So the array size is four elements because there are four characters in the string.
In this declaration
char l[] {'t', 'r', 'y'};
there is declared a character array that contains exactly three characters. As the array does not contain a string (a sequence of characters terminated with the zero or the so-called null character '\0') then the next statement
std::cout << l << std::endl;
invokes undefined behavior because in this case the operator << expects a pointer to a string (an array designator with rare exceptions is converted to pointer to its first element).
Instead you could write for example
std::cout.write( l, sizeof( l ) ) << std::endl;
Otherwise you could initialize the array by a string literal like
char l[] {"try"};
and write
std::cout << l << std::endl;
In this declaration
char l[] {'try'};
there is declared a character array that is initialized by the multicharacter literal 'try' that has an implementation defined value and the type int. Multicharacter literals are conditionally supported.
The compiler should issue a message for such a declaration that there is used a narrowing conversion from int to char.
Again you may not write
std::cout << l << std::endl;
because the array does not contain a string. It contains only one element with an implementation defined value.
Try this code snippet
char l[]{ 'try' };
std::cout << sizeof( l ) << '\n';
To output the array you could write as already shown above
std::cout.write( l, sizeof( l ) ) << std::endl;
When you insert an array into an output stream, the array argument implicitly converts to a pointer to char. When you pass a pointer to char into an output stream, it must point to a null terminated string of characters. If the pointer is not to a null terminated character string, then the behaviour of the program is undefined. Undefined behaviour should be avoided.
char l[] {'t', 'r', 'y'};
std::cout << l << std::endl;
l is an array of 3 characters that does not contain a null terminator character. In this example you insert a pointer to non-null terminated string into an output stream and the behaviour of the program is undefined. The program is broken. Don't do this.
char l[] {'try'};
l is an array of 1 character that does not contain a null terminator character. If you insert this array to an output stream, then the behaviour of the program is undefined. The program is broken. Don't do this.
Note: 'try' is a multicharacter literal. It's type is int and its value is implementation defined. Multicharacter literals aren't useful very often. What you probably intended to do, is to use a string literal:
char l[] = "try";
The string literal "try" is an array of 4 characters; ending in the null terminator character.

When a fixed-length char array is initialized with a short string, how is the remaining space initialized?

char s[10] = "Test";
How are the remaining chars (after "Test" and terminating null) initialized? (Is it defined?)
Background
I'm doing this to write a custom fixed-width (and ignored) header into an STL file. But I wouldn't like to have random/uninitialized bytes in the remaining space.
The general rule for any array (or struct) where not all members are initialized explicitly, is that the remaining ones are initialized "as if they had static storage duration". Which means that they are set to zero.
So it will actually work just fine to write something weird like this: char s[10] = {'T','e','s','t'};. Since the remaining bytes are set to zero and the first of them will be treated as the null terminator.
How are the remaining chars (after "Test" and terminating null) initialized? (Is it defined?)
Yes, it's well defined, in a char array initialized with a string literal and with specified size larger than the length of the string literal all the remaining elements are zero-initialized.
From C++ standard (tip-of-trunk) Character arrays § 9.4.3 [dcl.init.string]
3. If there are fewer initializers than there are array elements, each element not explicitly initialized shall be zero-initialized ([dcl.init]).
Some examples from cppreference:
char a[] = "abc";
// equivalent to char a[4] = {'a', 'b', 'c', '\0'};
// unsigned char b[3] = "abc"; // Error: initializer string too long
unsigned char b[5]{"abc"};
// equivalent to unsigned char b[5] = {'a', 'b', 'c', '\0', '\0'};
wchar_t c[] = {L"кошка"}; // optional braces
// equivalent to wchar_t c[6] = {L'к', L'о', L'ш', L'к', L'а', L'\0'};

Get away with Initialize the char array without putting \0 at the end of string

I am new to c++ language,recently, as I was taught that:
we should put '\0' at the end of char array while doing initialization ,for example :
char x[6] = "hello"; //OK
However,if you do :
char x[5] = "hello";
Then this would raise the error :
initializer-string for array of chars is too long
Everything goes as I expect until the experssion below does not raise the compile error...:
char x[5] = {'h','e','l','l','o'};
This really confuses me , So I would like to ask two questions :
1.Why doesn't expression char x[5] = "hello"; raise error?
2.To my knowledge,the function strlen() would stop only if it finds '\0' to determine the lengh of char array,in this case,what would strlen(x) return?
Thanks!
The string literal "hello" has six characters, because there's an implied nul terminator. So
char x[] = "hello";
defines an array of six char. That's almost always what you want, because the C-style string functions (strlen, strcpy, strcat, etc.) operate on C-style strings, which are, by definition, nul terminated.
But that doesn't mean that every array of char will be nul terminated.
char x[] = { 'h', 'e', 'l', 'l', 'o' };
This defines an array of five char. Applying C-style string functions to this array will result in undefined behavior, because the array does not have a nul terminator.
You can do character-by-character initialization and create a valid C-style string by explicitly including the nul terminator:
char x[] = { 'h', 'e', 'l', 'l', 'o', '\0' };
This defines an array of six char that holds a C-style string (i.e., a nul terminated sequence of characters).
The key here is to separate in your mind the general notion of an array of char from the more specific notion of an array of char that holds a C-style string. The latter is almost always what you want to do, but that doesn't mean that there is never a use for the former. It's just that the former is uncommon.
As an aside, in C you're allowed to elide the nul terminator:
char x[5] = "hello";
this is legal C, and it creates an array of 5 char, with no nul terminator. In C++ that's not legal.
Why doesn't expression char x[5] = "hello"; raise an error?
This is not true. The appearance of an error is expected in this case.
To my knowledge, the function strlen() would stop only if it finds '\0' to determine the length of the char array, in this case, what would strlen(x) return?
If you can run the code somehow, the program will undergo an undefined-behavior. That is, you will not get what you would expect. The strlen() will only stop counting when it finds a null-terminator, i.e. it may go outside the initialized part of the char array and access the uninitialized ones – it's where the UB is invoked.

Questions Dealing with Curly Braces in Array Initializer Lists [duplicate]

By accident I found that the line char s[] = {"Hello World"}; is properly compiled and seems to be treated the same as char s[] = "Hello World";. Isn't the first ({"Hello World"}) an array containing one element that is an array of char, so the declaration for s should read char *s[]? In fact if I change it to char *s[] = {"Hello World"}; the compiler accepts it as well, as expected.
Searching for an answer, the only place I found which mentioned this is this one but there is no citing of the standard.
So my question is, why the line char s[] = {"Hello World"}; is compiled although the left side is of type array of char and the right side is of type array of array of char?
Following is a working program:
#include<stdio.h>
int main() {
char s[] = {"Hello World"};
printf("%s", s); // Same output if line above is char s[] = "Hello World";
return 0;
}
Thanks for any clarifications.
P.S. My compiler is gcc-4.3.4.
It's allowed because the standard says so: C99 section 6.7.8, §14:
An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.
What this means is that both
char s[] = { "Hello World" };
and
char s[] = "Hello World";
are nothing more than syntactic sugar for
char s[] = { 'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', 0 };
On a related note (same section, §11), C also allows braces around scalar initializers like
int foo = { 42 };
which, incidentally, fits nicely with the syntax for compound literals
(int){ 42 }
The braces are optional, and the expression is equivalent to just an array of char.
You can also write this:
int a = {100}; //ok
Demo : http://ideone.com/z0psd
In fact, C++11 generalizes this very syntax, to initialize non-arrays as well as arrays, uniformly. So in C++11, you can have these:
int a{}; //a is initialized to zero, and it is NOT an array
int b[]{1,2,3,4}; //b is an array of size 4 containing elements 1,2,3,4
int c[10]{}; //all 10 elements are initialized to zero
int *d{}; //pointer initialized to nullptr
std::vector<int> v{1,2,3,4,5}; //vector is initialized uniformly as well.
Any variable in (int, char, etc.) is just an array of length 1.
char s = {0};
works as well.
I might be wrong, but I think this is not an array of arrays of chars, but a block contains an array of chars. int a = {1}; may work as well.
[...] In fact if I change it to
char *s[] = {"Hello World"}; the compiler accepts it as well, as
expected
The compiler accepets it,because actually, you're making an array 2D of undefined size elements,where you stored one element only,the "Hello World" string. Something like this:
char* s[] = {"Hello world", "foo", "baa" ...};
You can't omit the bracets in this case.
This is allowed by the C++ standard as well, Citation:
[dcl.init.string] §1
An array of narrow character type ([basic.fundamental]), char16_t array, char32_t array, or wchar_t array can be initialized by a narrow string literal, char16_t string literal, char32_t string literal, or wide string literal, respectively, or by an appropriately-typed string literal enclosed in braces ([lex.string]). [snip]

sizeof char array is off by one

I want to use the sizeof function to get the size of a char array. The size that I get is one too much. Example:
#include <stdio.h>
char text[] = "hey";
const int n = sizeof(text);
int main(int argc, char *argv[])
{
printf("%i\n", n);
return 0;
}
Outputs 4, instead of the expected 3. I reproduced this behaviour on various online c++ compilers, so I think it is intended (oddly enough, I can't find anything about on the internet). Most sources that I can find online say that it should be 3 * sizeof(char) (which is 3 on most normal systems).
If I understand everything correctly, there is an extra byte that is used for the array representation in some way. Why does this happen?
String literals are implicitly NUL terminated, so "hey" is actually four characters in size; the three letters you see, plus a \0 (aka NUL).
When you initialize an array without specifying a size, it's sizing it to match the initializer, and the initializer is that four byte quantity including the NUL. char text[] = "hey"; is equivalent to saying char text[] = {'h', 'e', 'y', '\0'};. If it didn't work like this, attempting to work with the contents of the array as a C-style string would run past the buffer into neighboring memory until it found a NUL terminator by coincidence.