Why can't character constants/literals be empty? - c++

In C and C++ the rules are the same. In C,
[§6.4.4.4]/2 An integer character constant is a sequence of one or
more multibyte characters enclosed in single-quotes, as in 'x'.
In C++,
[§2.14.3]/1 A character literal is one or more characters enclosed
in single quotes, as in 'x', optionally preceded by one of the
letters u, U, or L, as in u'y', U'z', or L'x',
respectively.
The key phrase is "one or more". In contrast, a string literal can be empty, "", presumably because it consists of the null terminating character. In C, this leads to awkward initialization of a char. Either you leave it uninitialized, or use a useless value like 0 or '\0'.
char garbage;
char useless = 0;
char useless2 = '\0';
In C++, you have to use a string literal instead of a character literal if you want it to be empty.
(somecondition ? ' ' : '') // error
(somecondition ? " " : "") // necessary
What is the reason it is this way? I'm assuming C++'s reason is inherited from C.

The reason is that a character literal is defined as a character. There may be extensions that allow it to be more than one character, but it needs to be at least one character or it just doesn't make any sense. It would be the same as trying to do:
int i = ;
If you don't specify a value, what do you put there?

This is because an empty string still contains the the null character '\0' at the end, so there is still a value to bind to the variable name, whereas an empty character literal has no value.

String is a set of character terminated by a NULL character ( '\0' ).
So a Empty string will always have a NULL character in it at the end .
But in case of a character literal no value is there.
it needs at least one character.

Related

Ignore function anomaly when used to reading files in c++

I am trying to use the ignore function skip a few lines, but the parameters of the function are oddly different. Shouldn't it be a streamsize(amount of characters and a delimiter(to stop ignoring up to the assigned character). The problem i am having is that the 2nd parameter for me it is required to be an integer. While i want to use "\n" it doesn't accept it because it is char.
std::basic_istream<char,std::char_traits<char>> &std::basic_istream<char,std::char_traits<char>>::ignore(std::streamsize,int)': cannot convert argument 2 from 'const char [2]' to 'int'
"\n" (with double quotes) is a string literal, not a char literal. In this case, it's an array of two chars; equivalent to {'\n', '\0'}.
'\n' (with single quotes) is a char literal. It represents a single newline character.
std::istream::ignore accepts only a single character as its delimiter, so you have to use the latter.
Note: std::istream::ignore's second parameter is an int rather than a char so that it can accommodate the extra "end of file" pseudo-character. The eof value has to be different than any valid character value, so the type used for the delimiter must be wider than char.

Is "'" identical to "\'" as per the C/C++ standard?

int main()
{
char* str1 = "Tom's cat";
char* str2 = "Tom\'s cat";
}
The code can be compiled with VS 2015.
I just wonder:
Are both of the two ways compliant to the C and/or the C++ standard?
From the C++11 ISO Standard
§ 2.14.5 String Literals [lex.string]
...
15 Escape sequences and universal-character-names in non-raw string literals have the same meaning as in character literals (2.14.3), except that the single quote ’ is representable either by itself or by the escape
sequence \’
Yes, within a string literal, both are the same.
The escaped version is required for a character literal:
char x = '\'';
The standard references are two sources. First, translation phases. From C.11 §5.1.1.2 (C++.11 [lex.phases] has similar language):
Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set; if there is no corresponding member, it is converted to an implementation defined member other than the null (wide) character.
Next is in the grammar definition for a character constant and for string literals, which allow for escape sequences. And simple-escape-sequence is an escape sequence in the grammar. C.11 §6.4.4.4 defines it (C++.11 [lex.ccon] has the same definition):
simple-escape-sequence: one of
\' \" \? \\
\a \b \f \n \r \t \v
Finally, for string literals, the standard specifies the interpretation of characters in the literal is the same as if each were a character constant, and then makes an exception of '. From C.11 §6.4.5 (C++.11 [lex.string] has similar language):
The same considerations apply to each element of the sequence in a string literal as if it
were in an integer character constant (for a character or UTF−8 string literal) or a wide
character constant (for a wide string literal), except that the single-quote ' is
representable either by itself or by the escape sequence \', but the double-quote " shall be represented by the escape sequence \".
\' is a valid character escape sequence in both C and C++. Hence, the lines
char* str1 = "Tom's cat";
char* str2 = "Tom\'s cat";
produce equivalent string literals, both in C and C++.
Yes, they're identical.
From the c++ standard, $2.13.3/7 Character literals [lex.ccon]
Table 6 — Escape sequences
new-line NL(LF) \n
horizontal tab HT \t
vertical tab VT \v
backspace BS \b
carriage return CR \r
form feed FF \f
alert BEL \a
backslash \ \\
question mark ? \?
single quote ’ \’
double quote " \"
octal number ooo \ooo
hex number hhh \xhhh

initialize char array with quotes and curly braces

I'm little confused. What is the logically difference between these codes?
#include <iostream>
using namespace std;
int main(){
char a[5]="ABCD"; // this
cout << a;
return 0;
}
Second is
char a[5]={"ABCD"}; // this
Third is
char a[5]={'A','B','C','D'}; // this
char a[5]={"ABCD"};
char a[5]={'A','B','C','D','\0'};
In both cases, the array of characters a is declared with a size of 5 elements of type char: the 4 characters that compose the word "ABCD", plus a final null character ('\0'), which specifies the end of the sequence and that, in the second case, when using double quotes (") it is appended automatically.Attention adding null character separating via commas. A series of characters enclosed in double quotes ("") is called a string constant. The C compiler can automatically add a null character '\0' at the end of a string constant to indicate the end of the string.
Source:This link can help you better
The first two are assignment of a char[5] source to a char[5] array with different syntax only. (the 5 being the four letters plus a null terminator)
The last one will also do the same, but it doesn't explicitly specify a null terminator. Since you are assigning to a char[5], the last one will still zero-fill the remaining space, effectively adding a null terminator and acting the same, but the last one will not throw a compiler error if you assign to a char[4]; it will just leave you with an unterminated array of characters.

Replace character in std::string with hex value

I am trying to replace a number/hex value in a std::string using std::replace but when I try and do fileBuf.replace(0x10, 1, "0x44"); it just expands the string with an ASCII "0x44" instead of replacing the 1 character at position 0x10 with the value 0x44. Is there a proper way to do this? Thanks
You need to use the \x escape sequence to represent hexadecimal characters. Moreover, since you're replacing just one character, you could use character literals rather than string literals:
fileBuf.replace(0x10, 1, '\x44');

What does '\0' mean?

I can't understand what the '\0' in the two different place mean in the following code:
string x = "hhhdef\n";
cout << x << endl;
x[3]='\0';
cout << x << endl;
cout<<"hhh\0defef\n"<<endl;
Result:
hhhdef
hhhef
hhh
Can anyone give me some pointers?
C++ std::strings are "counted" strings - i.e., their length is stored as an integer, and they can contain any character. When you replace the third character with a \0 nothing special happens - it's printed as if it was any other character (in particular, your console simply ignores it).
In the last line, instead, you are printing a C string, whose end is determined by the first \0 that is found. In such a case, cout goes on printing characters until it finds a \0, which, in your case, is after the third h.
C++ has two string types:
The built-in C-style null-terminated strings which are really just byte arrays and the C++ standard library std::string class which is not null terminated.
Printing a null-terminated string prints everything up until the first null character. Printing a std::string prints the whole string, regardless of null characters in its middle.
\0 is the NULL character, you can find it in your ASCII table, it has the value 0.
It is used to determinate the end of C-style strings.
However, C++ class std::string stores its size as an integer, and thus does not rely on it.
You're representing strings in two different ways here, which is why the behaviour differs.
The second one is easier to explain; it's a C-style raw char array. In a C-style string, '\0' denotes the null terminator; it's used to mark the end of the string. So any functions that process/display strings will stop as soon as they hit it (which is why your last string is truncated).
The first example is creating a fully-formed C++ std::string object. These don't assign any special meaning to '\0' (they don't have null terminators).
The \0 is treated as NULL Character. It is used to mark the end of the string in C.
In C, string is a pointer pointing to array of characters with \0 at the end. So following will be valid representation of strings in C.
char *c =”Hello”; // it is actually Hello\0
char c[] = {‘Y’,’o’,’\0′};
The applications of ‘\0’ lies in determining the end of string .For eg : finding the length of string.
The \0 is basically a null terminator which is used in C to terminate the end of string character , in simple words its value is null in characters basically gives the compiler indication that this is the end of the String Character
Let me give you example -
As we write printf("Hello World"); /* Hello World\0
here we can clearly see \0 is acting as null ,tough printinting the String in comments would give the same output .