What does this do?
const wchar_t *s = L"test";
If wchar_t is two bytes on my machine, then why should we tell the compiler that the string should be treated in a way that each element is long i.e, four bytes in size?
The L means that string is a string of wchar_t characters, rather than the normal string of char characters. I'm not sure where you got the bit about four bytes from.
From the spec section 6.4.5 String literals, paragraph 2:
A character string literal is a sequence of zero or more multibyte characters enclosed in double-quotes, as in "xyz". A wide string literal is the same, except prefixed by the letter L.
And an excerpt from paragraph 5:
For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type wchar_t, and are initialized with the sequence of wide characters corresponding to the multibyte character
sequence, as defined by the mbstowcs function with an implementation-defined current locale.
If in doubt, consult the standard (§6.4.5, String Literals):
A character string literal is a
sequence of zero or more multibyte
characters enclosed in double-quotes,
as in "xyz". A wide string literal is
the same, except prefixed by the
letter L.
Note that it does not indicate that each character is a long, despite being prefixed with the same letter as the long literal suffix.
L does not mean long integer when prefixing a string. It means each character in the string is a wide character.
Without this prefix, you are assigning a string of char to a wchar_t pointer, which would be a mismatch.
It indicates a string of wide characters, of type wchar_t.
If you don't know what that L does, then why are you making an assertive statement about each array element being long ("four bytes in size")? Where did that idea with the long come from?
That L has as much relation to long as it has to "leprechaun" - no relation at all. The L prefix means that the following string literal consists of wide characters, i.e. each character has wchar_t type.
P.S. Finally, it is always a good idea to use const-qualified pointers when pointing to string literals: const wchar_t *s = L"test";.
Related
I'm trying to assign the Chinese character 牛 as a char value in C++. On Xcode, I get the error:
"Character too large for enclosing character literal type."
When I use an online IDE like JDoodle or Browxy, I get the error:
"multi-character character constant."
It doesn't matter whether I use char, char16_t, char32_t or wchar_t, it won't work. I thought any Chinese character could at least fit into wchar_t, but this appears not to be the case. What can I do differently?
char letter = '牛';
char16_t character = '牛';
char32_t hanzi = '牛';
wchar_t word = '牛';
All of your character literals are standard chars. To get a wider type, you need to include the proper prefix on the literal:
char letter = '牛';
char16_t character = u'牛';
char32_t hanzi = U'牛';
wchar_t word = L'牛';
I have tried this following statement in C and C++.
char A[5] = {"Hello"};
While C accepts this, C++ is throwing an error saying the string is too long. If there is a null character to be added, why is it accepted in C but not in C++?
Please note that char A[5]={"Hello"}; is a bug in either language. There must be room to allocate the null terminator.
It compiles in C because the language 6.7.9/14 has an an odd special rule/language bug, emphasis mine:
An array of character type may be initialized by a character string literal or UTF−8 string
literal, optionally enclosed in braces. Successive bytes of the string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.
This allows a character array to be initialized with a string literal which has exactly the same amount of characters as the size of the array, but silently discard the null termination.
C++ fixed this dangerous language bug.
Is the string "Hello\n" equal to
{'H','e','l','l','o','\','n','\0'} or
{'H','e','l','l','o','\n','\0'}?
Does adding escape sequences in string definitions like:
char arr[] = "Hello\n";
Create strings like:
char arr[] = {'H','e','l','l','o','\','n','\0'};
or strings like:
char arr[] = {'H','e','l','l','o','\n','\0'};
also is the NULL character '\0' added every time when a declaration like char* foo = "HelloWorld!" is used.
Your second interpretation is correct, since the escape sequence you are talking about (newline), is only 1 character in length. The NULL character is added to to the end every time you make such a declaration.
The reason your first interpretation is incorrect, is because \ is the escape character, meaning it would escape the quote right after it. You can even see this in Stack Overflow's syntax highlighting!
char arr[] = {'H','e','l','l','o','\','n',\0'};
// See how the n is not highlighted --^
As evident, the n is outside the quotes and is interpreted as a keyword or an identifier.
"Hello\n" means {'H','e','l','l','o','\n','\0'}. It's \n is the newline character.
char* foo = "HelloWorld!"
is assigning the decayed pointer to the literal char array to the char* foo. And yes that string literal is null terminated char array.
Note that char* foo = ... and char foo[]=".." are two different things. The second one initializes the char array foo with the content of the string literal. First one is simply pointing to the immutable literal string(foo is pointing to that literal string).
From standard 6.7.9
An array of character type may be initialized by a character string literal or UTF-8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
If you compiled your code with char arr[] = {'H','e','l','l','o','\','n',\0'}; you will likely to see the message
error: stray '\' in program
From standard 5.2.1 again:
In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or by escape sequences consisting of the backslash \ followed by one or more characters. A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string.
Is a c-style string containing only one char considered a string or would you call that construct a char?
Zero or more characters followed by a NUL-terminator is a C-style string. You can use the double quotation character notation to define a literal.
In C, an int that can fit into a char, such as '3' is a char.
Something like '34' is multicharacter literal.
A one element buffer is still technically a buffer. Forming a pointer to the start of it is not at all affected by how many items are in it.
So no, it's not a char. Furthermore, even the type system would differentiate char[1] from char.
It's also worth nothing that you may be surprised by what is a 1 character string. Because this one "a" has two characters in the buffer, not one. The only one character buffer that is a valid C-string is the empty string.
Is a c-style string containing only one char considered a string or
would you call that construct a char?
Indeed a C-Style string means a string i.e. it is quite different from a char data type. Since in C language, You don't have a dedicated built-in type to manipulate and represent string type like in C++ we have std::string hence once has to use character arrays (essentially null terminated) i.e. char str[SIZE] = "something" to represent character string type. On the other hand a single character is stored in char which is altogether different from char []. These two things are not same!
Example,
char str[] = "a"; // sizeof(str) will give 2 because presence of extra NULL character
char c = 'a'; // simply a single character
Normally in C++, character arrays are initialized in the following way,
char example[5]="cat";
What if you initialize it with "" (just a double quotes without spaces)?
What will be the elements in the character array after initialization?
The declaration
char temp[3] = "";
is same as
char temp[3] = {0};
// `\0` ascii value is 0
remember remaining elements of half initialized array initialized with 0.
Point :char temp[3] = "" is easy to type(means writing), so its preferable.
Look even compare it with this declaration char temp[3] = {'\0'}; (it need more chars to type) Whereas in char temp[3] = ""; is simple (even no type mismatch - int/char).
It's a 3-character array initialized to three null characters.
EDIT (after comments below):
From K&R:
If there are fewer initializers for an array than the number specified, the missing elements will be zero for external, static, and automatic variables.
...
Character arrays are a special case of initialization; a string may be used instead of the braces and commas notation:
char pattern[] = "ould";
is a shorthand for the longer but equivalent
char pattern[] = { 'o', 'u', 'l', 'd', '\0' };
From a draft copy of the C++ standard, section 8.5.2, Character arrays:
"1. A char array (whether plain char, signed char, or unsigned char), char16_t array, char32_t array, or wchar_t array can be initialized by a narrow character literal, char16_t string literal, char32_t string literal, or wide string literal, respectively, or by an appropriately-typed string literal enclosed in braces. Successive characters of the value of the string literal initialize the elements of the array. [Example:
char msg[] = "Syntax error on line %s\n";
shows a character array whose members are initialized with a string-literal. Note that because ’\n’ is a single character and because a trailing ’\0’ is appended, sizeof(msg) is 25. — end example ]
...
"3. If there are fewer initializers than there are array elements, each element not explicitly initialized shall be zero-initialized (8.5)."
A blank string. The first char will be the null terminator. After an experiment, it appears the remaining characters are set to 0 (which is also null terminator).
You are setting the first element to be a null termination character. The other elements gain partial initialisation to zero: C and C++ : Partial initialization of automatic structure