In the following example:
cout<<"\n"[a==N];
I have no clue about what the [] option does in cout, but it does not print a newline when the value of a is equal to N.
I have no clue about what the [] option does in cout
This is actually not a cout option, what is happening is that "\n" is a string literal. A string literal has the type array of n const char, the [] is simply an index into an array of characters which in this case contains:
\n\0
note \0 is appended to all string literals.
The == operator results in either true or false, so the index will be:
0 if false, if a does not equal N resulting in \n
1 if true, if a equals N resulting in \0
This is rather cryptic and could have been replaced with a simple if.
For reference the C++14 standard(Lightness confirmed the draft matches the actual standard) with the closest draft being N3936 in section 2.14.5 String literals [lex.string] says (emphasis mine):
string literal has type “array of n const char”, where n is the
size of the string as defined below, and has static storage duration
(3.7).
and:
After any necessary concatenation, in translation phase 7 (2.2),
’\0’ is appended to every string literal so that programs that scan a string can find its end.
section 4.5 [conv.prom] says:
A prvalue of type bool can be converted to a prvalue of type int, with
false becoming zero and true becoming one.
Writing a null character to a text stream
The claim was made that writing a null character(\0) to a text stream is undefined behavior.
As far as I can tell this is a reasonable conclusion, cout is defined in terms of C stream, as we can see from 27.4.2 [narrow.stream.objects] which says:
The object cout controls output to a stream buffer associated with the object stdout, declared in
<cstdio> (27.9.2).
and the C11 draft standard in section 7.21.2 Streams says:
[...]Data read in from a text stream will necessarily compare equal to the data
that were earlier written out to that stream only if: the data consist only of printing
characters and the control characters horizontal tab and new-line;
and printing characters are covered in 7.4 Character handling <ctype.h>:
[...]the term control character
refers to a member of a locale-specific set of characters that are not printing
characters.199) All letters and digits are printing characters.
with footnote 199 saying:
In an implementation that uses the seven-bit US ASCII character set, the printing characters are those
whose values lie from 0x20 (space) through 0x7E (tilde); the control characters are those whose
values lie from 0 (NUL) through 0x1F (US), and the character 0x7F (DEL).
and finally we can see that the result of sending a null character is not specified and we can see this is undefined behavior from section 4 Conformance which says:
[...]Undefined behavior is otherwise
indicated in this International Standard by the words ‘‘undefined behavior’’ or by the
omission of any explicit definition of behavior.[...]
We can also look to the C99 rationale which says:
The set of characters required to be preserved in text stream I/O are those needed for writing C
programs; the intent is that the Standard should permit a C translator to be written in a maximally
portable fashion. Control characters such as backspace are not required for this purpose, so their
handling in text streams is not mandated.
cout<<"\n"[a==N];
I have no clue about what the [] option does in cout
In C++ operator Precedence table, operator [] binds tighter than operator <<, so your code is equivalent to:
cout << ("\n"[a==N]); // or cout.operator <<("\n"[a==N]);
Or in other words, operator [] does nothing directly with cout. It is used only for indexing of string literal "\n"
For example for(int i = 0; i < 3; ++i) std::cout << "abcdef"[i] << std::endl; will print characters a, b and c on consecutive lines on the screen.
Because string literals in C++ are always terminated with null character('\0', L'\0', char16_t(), etc), a string literal "\n" is a const char[2] holding the characters '\n' and '\0'
In memory layout this looks like:
+--------+--------+
| '\n' | '\0' |
+--------+--------+
0 1 <-- Offset
false true <-- Result of condition (a == n)
a != n a == n <-- Case
So if a == N is true (promoted to 1), expression "\n"[a == N] results in '\0' and '\n' if result is false.
It is functionally similar (not same) to:
char anonymous[] = "\n";
int index;
if (a == N) index = 1;
else index = 0;
cout << anonymous[index];
valueof "\n"[a==N] is '\n' or '\0'
typeof "\n"[a==N] is const char
If the intention is to print nothing (Which may be different from printing '\0' depending on platform and purpose), prefer the following line of code:
if(a != N) cout << '\n';
Even if your intention is to write either '\0' or '\n' on the stream, prefer a readable code for example:
cout << (a == N ? '\0' : '\n');
It's probably intended as a bizarre way of writing
if ( a != N ) {
cout<<"\n";
}
The [] operator selects an element from an array. The string "\n" is actually an array of two characters: a new line '\n' and a string terminator '\0'. So cout<<"\n"[a==N] will print either a '\n' character or a '\0' character.
The trouble is that you're not allowed to send a '\0' character to an I/O stream in text mode. The author of that code might have noticed that nothing seemed to happen, so he assumed that cout<<'\0' is a safe way to do nothing.
In C and C++, that is a very poor assumption because of the notion of undefined behavior. If the program does something that is not covered by the specification of the standard or the particular platform, anything can happen. A fairly likely outcome in this case is that the stream will stop working entirely — no more output to cout will appear at all.
In summary, the effect is,
"Print a newline if a is not equal to N. Otherwise, I don't know. Crash or something."
… and the moral is, don't write things so cryptically.
It is not an option of cout but an array index of "\n"
The array index [a==N] evaluates to [0] or [1], and indexes the character array represented by "\n" which contains a newline and a nul character.
However passing nul to the iostream will have undefined results, and it would be better to pass a string:
cout << &("\n"[a==N]) ;
However, the code in either case is not particularly advisable and serves no particular purpose other than to obfuscate; do not regard it as an example of good practice. The following is preferable in most instances:
cout << (a != N ? "\n" : "") ;
or just:
if( a != N ) cout << `\n` ;
Each of the following lines will generate exactly the same output:
cout << "\n"[a==N]; // Never do this.
cout << (a==N)["\n"]; // Or this.
cout << *((a==N)+"\n"); // Or this.
cout << *("\n"+(a==N)); // Or this.
As the other answers have specified, this has nothing to do with std::cout. It instead is a consequence of
How the primitive (non-overloaded) subscripting operator is implemented in C and C++.
In both languages, if array is a C-style array of primitives, array[42] is syntactic sugar for *(array+42). Even worse, there's no difference between array+42 and 42+array. This leads to interesting obfuscation: Use 42[array] instead of array[42] if your goal is to utterly obfuscate your code. It goes without saying that writing 42[array] is a terrible idea if your goal is to write understandable, maintainable code.
How booleans are transformed to integers.
Given an expression of the form a[b], either a or b must be a pointer expression and the other; the other must be an integer expression. Given the expression "\n"[a==N], the "\n" represents the pointer part of that expression and the a==N represents the integer part of the expression. Here, a==N is a boolean expression that evaluates to false or true. The integer promotion rules specify that false becomes 0 and true becomes 1 on promotion to an integer.
How string literals degrade into pointers.
When a pointer is needed, arrays in C and C++ readily degrade into a pointer that points to the first element of the array.
How string literals are implemented.
Every C-style string literal is appended with the null character '\0'. This means the internal representation of your "\n" is the array {'\n', '\0'}.
Given the above, suppose a==N evaluates to false. In this case, the behavior is well-defined across all systems: You'll get a newline. If, on the other hand, a==N evaluates to true, the behavior is highly system dependent. Based on comments to answers to the question, Windows will not like that. On Unix-like systems where std::cout is piped to the terminal window, the behavior is rather benign. Nothing happens.
Just because you can write code like that doesn't mean you should. Never write code like that.
Related
In C++ Primer 5th Edition I saw this
when I tried to use it---
At this time it didn't work, but the program's output did give a weird symbol, but signed is totally blank And also they give some warnings when I tried to compile it. But C++ primer and so many webs said it should work... So I don't think they give the wrong information did I do something wrong?
I am newbie btw :)
But C++ primer ... said it should work
No it doesn't. The quote from C++ primer doesn't use std::cout at all. The output that you see doesn't contradict with what the book says.
So I don't think they give the wrong information
No1.
did I do something wrong?
It seems that you've possibly misunderstood what the value of a character means, or possibly misunderstood how character streams work.
Character types are integer types (but not all integer types are character types). The values of unsigned char are 0..255 (on systems where size of byte is 8 bits). Each2 of those values represent some textual symbol. The mapping from a set of values to a set of symbols is called a "character set" or "character encoding".
std::cout is a character stream. << is stream insertion operator. When you insert a character into a stream, the behaviour is not to show the numerical value. Instead, the behaviour to show the symbol that the value is mapped to3 in the character set that your system uses. In this case, it appears that the value 255 is mapped to whatever strange symbol you saw on the screen.
If you wish to print the numerical value of a character, what you can do is convert to a non-character integer type and insert that to the character stream:
int i = c;
std::cout << i;
1 At least, there's no wrong information regarding your confusion. The quote is a bit inaccurate and outdated in case of c2. Before C++20, the value was "implementation defined" rather than "undefined". Since C++20, the value is actually defined, and the value is 0 which is the null terminator character that signifies end of a string. If you try to print this character, you'll see no output.
2 This was bit of a lie for simplicity's sake. Some characters are not visible symbols. For example, there is the null terminator charter as well as other control characters. The situation becomes even more complex in the case of variable width encodings such as the ubiquitous Unicode, where symbols may consist of a sequence of several char. In such encoding, and individual char cannot necessarily be interpreted correctly without other char that are part of such sequence.
3 And this behaviour should feel natural once you grok the purpose of character types. Consider following program:
unsigned char c = 'a';
std::cout << c;
It would be highly confusing if the output would be a number that is the value of the character (such as 97 which may be the value of the symbol 'a' on the system) rather than the symbol 'a'.
For extra meditation, think about what this program might print (and feel free to try it out):
char c = 57;
std::cout << c << '\n';
int i = c;
std::cout << i << '\n';
c = '9';
std::cout << c << '\n';
i = c;
std::cout << i << '\n';
This is due to the behavior of the << operator on the char type and the character stream cout. Note, the << is known as formatted output means it does some implicit formatting.
We can say that the value of a variable is not the same as its representation in certain contexts. For example:
int main() {
bool t = true;
std::cout << t << std::endl; // Prints 1, not "true"
}
Think of it this way, why would we need char if it would still behave like a number when printed, why not to use int or unsigned? In essence, we have different types so to have different behaviors which can be deduced from these types.
So, the underlying numeric value of a char is probably not what we looking for, when we print one.
Check this for example:
int main() {
unsigned char c = -1;
int i = c;
std::cout << i << std::endl; // Prints 255
}
If I recall correctly, you're somewhat close in the Primer to the topic of built-in types conversions, it will bring in clarity when you'll get to know these rules better. Anyway, I'm sure, you will benefit greatly from looking into this article. Especially the "Printing chars as integers via type casting" part.
I'm trying to understand what std::string::size() returns.
According to https://en.cppreference.com/w/cpp/string/basic_string/size it's the "number of CharT elements in the string", but I'm not sure how that relates to the number of printed characters, especially if string termination characters are involved somehow.
This code
int main()
{
std::string str0 = "foo" "\0" "bar";
cout << str0 << endl;
cout << str0.size() << endl;
std::string str1 = "foo0bar";
str1[3] = '\0';
cout << str1 << endl;
cout << str1.size() << endl;
return 0;
}
prints
foo
3
foobar
7
In the case of str0, the size matches the number of printed characters. I assume the constructor iterates on the characters of the string literal until it reaches \0, which is why only 'f', 'o' and 'o' are put in the std::string, i.e. 3 characters, and the string termination character is not put in the std::string.
In the case of str1, the size doesn't match the number of printed characters. I assume the same went on as what I described above, but that I broke something by assigning a character. According to cppreference.com, "the behavior is undefined if this character is modified to any value other than CharT()", so I assume I've walked into undefined behavior here.
My question is this: outside of undefined behavior, is it possible that the size of a std::string doesn't match the number of printed characters, or is it actually something guaranteed by the standard?
(note: if the answer to that question changed between versions of the standard I'm interested in knowing that too)
In the case of str1 ... the behavior is undefined if this character is modified to any value other than CharT(), so I assume I've walked into undefined behavior here.
Your assumption is wrong. There is no UB for two reasons:
You did assign the element to '\0' which happens to be same as CharT() and thus it would be well defined to assign that value to str1[str1.size()].
Furthermore, str1.size() is 7 as you demonstrated and 3 is less than 7 and is therefore within bounds and it would be well defined to assign any value to that element.
is it possible that the size of a std::string doesn't match the number of printed characters
Yes, it is possible. std::string can contain non-printable characters as well, and thus the size is not necessarily the same as the number of printed characters. Your example str1 has no undefined behaviour and demonstrates how size can be different from number of printed characters.
Besides non-printable characters, in some character encodings - notably in unicode - grapheme clusters may consist of multiple graphemes which may consist of multiple code points which may consist of multiple code units (code unit is a single char object). The size of the string is the number of chars i.e. the number of code units. Thus, one should not expect the size of the string to match the number of printed characters.
or is it actually something guaranteed by the standard?
No such guarantee exists.
if the answer to that question changed between versions of the standard I'm interested in knowing that too
There has been no change regarding this.
std::string has several constructors, one of which receives const char* and that's the one that constructs str0. Because there's no length information provided, the string will just be initialized until the null termination character is found
In case of str1 then the string length is really 7 characters. When you replace str1[3] with '\0' then the string doesn't change its length, but the content is now "foo\0bar". Unlike C string, std::string can contain embedded null because it has the length information. Therefore when you cout << str1 << endl; exactly 7 bytes are printed out. It's just that you don't see the byte '\0' in the output because it's ASCII NUL which isn't a printable character
It's recommended to use the s suffix to construct the std::string faster and with the ability to construct from a string with embedded null directly without resorting to another constructor. Try auto str0 = "foo\0bar"s; and see
Will cout function of c++ stop printing if it comes across a null character(positioned in middle of array) in character array?
Firstly, std::cout isn't a function. It's an object (specifically an output stream object).
Its << operator is overloaded. The overloads which are particularly relevant are those which accept strings.
If we pass a std::string, then << will output every character in the string.
If we pass a C-style string (char*), then << will output every character up to, but not including, the first NUL character encountered. Note that for malformed input (containing no NUL), this can lead to undefined behaviour, due to reading beyond the bounds of the array.
The other (unformatted) output operation is write(), which accepts a character array and a length. That outputs exactly the number of characters specified, including any NULs encountered.
This is trivial, probably silly, but I need to understand what state cout is left in after you try to print the contents of a character pointer initialized to '\0' (or 0). Take a look at the following snippet:
const char* str;
str = 0; // or str = '\0';
cout << str << endl;
cout << "Welcome" << endl;
On the code snippet above, line 4 wont print "Welcome" to the console after the attempt to print str on line 3. Is there some behavior I should be aware of? If I substitute line 1-3 with cout << '\0' << endl; the message "Welcome" on the following line will be successfully printed to the console.
NOTE: Line 4 just silently fails to print. No warning or error message or anything (at least not using MinGW(g++) compiler). It spewed an exception when I compiled the same code using MS cl compiler.
EDIT: To dispel the notion that the code fails only when you assign str to '\0', I modified the code to assign to 0 - which was previously commented
If you insert a const char* value to a standard stream (basic_ostream<>), it is required that it not be null. Since str is null you violate this requirement and the behavior is undefined.
The relevant paragraph in the standard is at §27.7.3.6.4/3.
The reason it works with '\0' directly is because '\0' is a char, so no requirements are broken. However, Potatoswatter has convinced me that printing this character out is effectively implementation-defined, so what you see might not quite be what you want (that is, perform your own checks!).
Don't use '\0' when the value in question isn't a "character"
(terminator for a null terminated string or other). That is, I think,
the source of your confusion. Something like:
char const* str = "\0";
std::cout << str << std::endl;
is fine, where str points to a string which contains a '\0' (in this
case, two '\0'). Something like:
char const* str = NULL;
std::cout << str << std::endl;
is undefined behavior; anything can happen.
For historical reasons (dating back to C), '\0' and 0 will convert
implicitly to any pointer type, resulting in a null pointer.
A char* that points to a null character is simply a zero-length string. No harm in printing that.
But a char* whose value is null is a different story. Trying to print that would mean dereferencing a null pointer, which is undefined behavior. A crash is likely.
Assigning '\0' to a pointer isn't really correct, by the way, even if it happens to work: you're assigning a character value to a pointer variable. Use 0 or NULL, or nullptr in C++11, when assigning to a pointer.
Just regarding the cout << '\0' part…
"Terminating the string" of a file or stream in text mode has an undefined effect on its contents. The C++ standard defers to the C standard on matters of text semantics (C++11 27.9.1.1/2), and C is pretty draconian (C99 §7.19.2/2):
Data read in from a text stream will necessarily compare equal to the data that were earlier written out to that stream only if: the data consist only of printing characters and the control characters horizontal tab and new-line; no new-line character is immediately preceded by space characters; and the last character is a new-line character.
Since '\0' is a control character and cout is a text stream, the resulting output may not read as you wrote it.
Take a look at this example:
http://ideone.com/8MHGH
The main problem you have is that str is pointer to a char not a char, so you should assign it to a string: str = "\0";
When you assign it to char, it remains 0 and then the fail bit of cout becomes true and you can no longer print to it. Here is another example where this is fixed:
http://ideone.com/c4LPh
Why is it that you can insert a '\0' char in a std::basic_string and the .length() method is unaffected but if you call char_traits<char>::length(str.c_str()) you get the length of the string up until the first '\0' character?
e.g.
string str("abcdefgh");
cout << str.length(); // 8
str[4] = '\0';
cout << str.length(); // 8
cout << char_traits<char>::length(str.c_str()); // 4
Great question!
The reason is that a C-style string is defined as a sequence of bytes that ends with a null byte. When you use .c_str() to get a C-style string out of a C++ std::string, then you're getting back the sequence the C++ string stores with a null byte after it. When you pass this into strlen, it will scan across the bytes until it hits a null byte, then report how many characters it found before that. If the string contains a null byte, then strlen will report a value that's smaller than the whole length of the string, since it will stop before hitting the real end of the string.
An important detail is that strlen and char_traits<char>::length are NOT the same function. However, the C++ ISO spec for char_traits<charT>::length (§21.1.1) says that char_traits<charT>::length(s) returns the smallest i such that char_traits<charT>::eq(s[i], charT()) is true. For char_traits<char>, the eq function just returns if the two characters are equal by doing a == comparison, and constructing a character by writing char() produces a null byte, and so this is equal to saying "where is the first null byte in the string?" It's essentially how strlen works, though the two are technically different functions.
A C++ std::string, however, it a more general notion of "an arbitrary sequence of characters." The particulars of its implementation are hidden from the outside world, though it's probably represented either by a start and stop pointer or by a pointer and a length. Because this representation does not depend on what characters are being stored, asking the std::string for its length tells you how many characters are there, regardless of what those characters actually are.
Hope this helps!