Sign & Unsigned Char is not working in C++

Sign & Unsigned Char is not working in C++ - c++

In C++ Primer 5th Edition I saw this
when I tried to use it---
At this time it didn't work, but the program's output did give a weird symbol, but signed is totally blank And also they give some warnings when I tried to compile it. But C++ primer and so many webs said it should work... So I don't think they give the wrong information did I do something wrong?
I am newbie btw :)

But C++ primer ... said it should work
No it doesn't. The quote from C++ primer doesn't use std::cout at all. The output that you see doesn't contradict with what the book says.
So I don't think they give the wrong information
No1.
did I do something wrong?
It seems that you've possibly misunderstood what the value of a character means, or possibly misunderstood how character streams work.
Character types are integer types (but not all integer types are character types). The values of unsigned char are 0..255 (on systems where size of byte is 8 bits). Each2 of those values represent some textual symbol. The mapping from a set of values to a set of symbols is called a "character set" or "character encoding".
std::cout is a character stream. << is stream insertion operator. When you insert a character into a stream, the behaviour is not to show the numerical value. Instead, the behaviour to show the symbol that the value is mapped to3 in the character set that your system uses. In this case, it appears that the value 255 is mapped to whatever strange symbol you saw on the screen.
If you wish to print the numerical value of a character, what you can do is convert to a non-character integer type and insert that to the character stream:
int i = c;
std::cout << i;
1 At least, there's no wrong information regarding your confusion. The quote is a bit inaccurate and outdated in case of c2. Before C++20, the value was "implementation defined" rather than "undefined". Since C++20, the value is actually defined, and the value is 0 which is the null terminator character that signifies end of a string. If you try to print this character, you'll see no output.
2 This was bit of a lie for simplicity's sake. Some characters are not visible symbols. For example, there is the null terminator charter as well as other control characters. The situation becomes even more complex in the case of variable width encodings such as the ubiquitous Unicode, where symbols may consist of a sequence of several char. In such encoding, and individual char cannot necessarily be interpreted correctly without other char that are part of such sequence.
3 And this behaviour should feel natural once you grok the purpose of character types. Consider following program:
unsigned char c = 'a';
std::cout << c;
It would be highly confusing if the output would be a number that is the value of the character (such as 97 which may be the value of the symbol 'a' on the system) rather than the symbol 'a'.
For extra meditation, think about what this program might print (and feel free to try it out):
char c = 57;
std::cout << c << '\n';
int i = c;
std::cout << i << '\n';
c = '9';
std::cout << c << '\n';
i = c;
std::cout << i << '\n';

This is due to the behavior of the << operator on the char type and the character stream cout. Note, the << is known as formatted output means it does some implicit formatting.
We can say that the value of a variable is not the same as its representation in certain contexts. For example:
int main() {
bool t = true;
std::cout << t << std::endl; // Prints 1, not "true"
}
Think of it this way, why would we need char if it would still behave like a number when printed, why not to use int or unsigned? In essence, we have different types so to have different behaviors which can be deduced from these types.
So, the underlying numeric value of a char is probably not what we looking for, when we print one.
Check this for example:
int main() {
unsigned char c = -1;
int i = c;
std::cout << i << std::endl; // Prints 255
}
If I recall correctly, you're somewhat close in the Primer to the topic of built-in types conversions, it will bring in clarity when you'll get to know these rules better. Anyway, I'm sure, you will benefit greatly from looking into this article. Especially the "Printing chars as integers via type casting" part.

Related

what does cout << "\n"[a==N]; do?

In the following example:
cout<<"\n"[a==N];
I have no clue about what the [] option does in cout, but it does not print a newline when the value of a is equal to N.

I have no clue about what the [] option does in cout
This is actually not a cout option, what is happening is that "\n" is a string literal. A string literal has the type array of n const char, the [] is simply an index into an array of characters which in this case contains:
\n\0
note \0 is appended to all string literals.
The == operator results in either true or false, so the index will be:
0 if false, if a does not equal N resulting in \n
1 if true, if a equals N resulting in \0
This is rather cryptic and could have been replaced with a simple if.
For reference the C++14 standard(Lightness confirmed the draft matches the actual standard) with the closest draft being N3936 in section 2.14.5 String literals [lex.string] says (emphasis mine):
string literal has type “array of n const char”, where n is the
size of the string as defined below, and has static storage duration
(3.7).
and:
After any necessary concatenation, in translation phase 7 (2.2),
’\0’ is appended to every string literal so that programs that scan a string can find its end.
section 4.5 [conv.prom] says:
A prvalue of type bool can be converted to a prvalue of type int, with
false becoming zero and true becoming one.
Writing a null character to a text stream
The claim was made that writing a null character(\0) to a text stream is undefined behavior.
As far as I can tell this is a reasonable conclusion, cout is defined in terms of C stream, as we can see from 27.4.2 [narrow.stream.objects] which says:
The object cout controls output to a stream buffer associated with the object stdout, declared in
<cstdio> (27.9.2).
and the C11 draft standard in section 7.21.2 Streams says:
[...]Data read in from a text stream will necessarily compare equal to the data
that were earlier written out to that stream only if: the data consist only of printing
characters and the control characters horizontal tab and new-line;
and printing characters are covered in 7.4 Character handling <ctype.h>:
[...]the term control character
refers to a member of a locale-specific set of characters that are not printing
characters.199) All letters and digits are printing characters.
with footnote 199 saying:
In an implementation that uses the seven-bit US ASCII character set, the printing characters are those
whose values lie from 0x20 (space) through 0x7E (tilde); the control characters are those whose
values lie from 0 (NUL) through 0x1F (US), and the character 0x7F (DEL).
and finally we can see that the result of sending a null character is not specified and we can see this is undefined behavior from section 4 Conformance which says:
[...]Undefined behavior is otherwise
indicated in this International Standard by the words ‘‘undefined behavior’’ or by the
omission of any explicit definition of behavior.[...]
We can also look to the C99 rationale which says:
The set of characters required to be preserved in text stream I/O are those needed for writing C
programs; the intent is that the Standard should permit a C translator to be written in a maximally
portable fashion. Control characters such as backspace are not required for this purpose, so their
handling in text streams is not mandated.

cout<<"\n"[a==N];
I have no clue about what the [] option does in cout
In C++ operator Precedence table, operator [] binds tighter than operator <<, so your code is equivalent to:
cout << ("\n"[a==N]); // or cout.operator <<("\n"[a==N]);
Or in other words, operator [] does nothing directly with cout. It is used only for indexing of string literal "\n"
For example for(int i = 0; i < 3; ++i) std::cout << "abcdef"[i] << std::endl; will print characters a, b and c on consecutive lines on the screen.
Because string literals in C++ are always terminated with null character('\0', L'\0', char16_t(), etc), a string literal "\n" is a const char[2] holding the characters '\n' and '\0'
In memory layout this looks like:
+--------+--------+
| '\n' | '\0' |
+--------+--------+
0 1 <-- Offset
false true <-- Result of condition (a == n)
a != n a == n <-- Case
So if a == N is true (promoted to 1), expression "\n"[a == N] results in '\0' and '\n' if result is false.
It is functionally similar (not same) to:
char anonymous[] = "\n";
int index;
if (a == N) index = 1;
else index = 0;
cout << anonymous[index];
valueof "\n"[a==N] is '\n' or '\0'
typeof "\n"[a==N] is const char
If the intention is to print nothing (Which may be different from printing '\0' depending on platform and purpose), prefer the following line of code:
if(a != N) cout << '\n';
Even if your intention is to write either '\0' or '\n' on the stream, prefer a readable code for example:
cout << (a == N ? '\0' : '\n');

It's probably intended as a bizarre way of writing
if ( a != N ) {
cout<<"\n";
}
The [] operator selects an element from an array. The string "\n" is actually an array of two characters: a new line '\n' and a string terminator '\0'. So cout<<"\n"[a==N] will print either a '\n' character or a '\0' character.
The trouble is that you're not allowed to send a '\0' character to an I/O stream in text mode. The author of that code might have noticed that nothing seemed to happen, so he assumed that cout<<'\0' is a safe way to do nothing.
In C and C++, that is a very poor assumption because of the notion of undefined behavior. If the program does something that is not covered by the specification of the standard or the particular platform, anything can happen. A fairly likely outcome in this case is that the stream will stop working entirely — no more output to cout will appear at all.
In summary, the effect is,
"Print a newline if a is not equal to N. Otherwise, I don't know. Crash or something."
… and the moral is, don't write things so cryptically.

It is not an option of cout but an array index of "\n"
The array index [a==N] evaluates to [0] or [1], and indexes the character array represented by "\n" which contains a newline and a nul character.
However passing nul to the iostream will have undefined results, and it would be better to pass a string:
cout << &("\n"[a==N]) ;
However, the code in either case is not particularly advisable and serves no particular purpose other than to obfuscate; do not regard it as an example of good practice. The following is preferable in most instances:
cout << (a != N ? "\n" : "") ;
or just:
if( a != N ) cout << `\n` ;

Each of the following lines will generate exactly the same output:
cout << "\n"[a==N]; // Never do this.
cout << (a==N)["\n"]; // Or this.
cout << *((a==N)+"\n"); // Or this.
cout << *("\n"+(a==N)); // Or this.
As the other answers have specified, this has nothing to do with std::cout. It instead is a consequence of
How the primitive (non-overloaded) subscripting operator is implemented in C and C++.
In both languages, if array is a C-style array of primitives, array[42] is syntactic sugar for *(array+42). Even worse, there's no difference between array+42 and 42+array. This leads to interesting obfuscation: Use 42[array] instead of array[42] if your goal is to utterly obfuscate your code. It goes without saying that writing 42[array] is a terrible idea if your goal is to write understandable, maintainable code.
How booleans are transformed to integers.
Given an expression of the form a[b], either a or b must be a pointer expression and the other; the other must be an integer expression. Given the expression "\n"[a==N], the "\n" represents the pointer part of that expression and the a==N represents the integer part of the expression. Here, a==N is a boolean expression that evaluates to false or true. The integer promotion rules specify that false becomes 0 and true becomes 1 on promotion to an integer.
How string literals degrade into pointers.
When a pointer is needed, arrays in C and C++ readily degrade into a pointer that points to the first element of the array.
How string literals are implemented.
Every C-style string literal is appended with the null character '\0'. This means the internal representation of your "\n" is the array {'\n', '\0'}.
Given the above, suppose a==N evaluates to false. In this case, the behavior is well-defined across all systems: You'll get a newline. If, on the other hand, a==N evaluates to true, the behavior is highly system dependent. Based on comments to answers to the question, Windows will not like that. On Unix-like systems where std::cout is piped to the terminal window, the behavior is rather benign. Nothing happens.
Just because you can write code like that doesn't mean you should. Never write code like that.

Odd behavior in boost::format hex

I'm trying to format a binary array: char* memblock to a hex string.
When I use the following:
fprintf(out, "0x%02x,", memblock[0]);
I get the following output:
0x7f,
When I try to use boost::format on an ofstream like so:
std::ofstream outFile (path, std::ios::out); //also tried with binary
outFile << (boost::format("0x%02x")%memblock[0]);
I get a weird output like this (seen in Vi): 0x0^?.
What gives?

Given that the character for 0x7f is CTRL-?, it looks like it's outputting the memblock[0] as a character rather than a hex value, despite your format string.
This actually makes sense based on what I've read in the documentation. Boost::format is a type-safe library where the format specifiers dictate how a variable will be output, but limited by the actual type of said variable, which takes precedence.
The documentation states (my bold):
Legacy printf format strings: %spec where spec is a printf format specification.
spec passes formatting options, like width, alignment, numerical base used for formatting numbers, as well as other specific flags. But the classical type-specification flag of printf has a weaker meaning in format.
It merely sets the appropriate flags on the internal stream, and/or formatting parameters, but does not require the corresponding argument to be of a specific type. e.g. : the specification 2$x, meaning "print argument number 2, which is an integral number, in hexa" for printf, merely means "print argument 2 with stream basefield flags set to hex" for format.
And presumably, having the field flag set to hex doesn't make a lot of sense when you're printing a char, so it's ignored. Additionally from that documentation (though paraphrased a little):
The type-char does not impose the concerned argument to be of a restricted set of types, but merely sets the flags that are associated with this type specification. A type-char of p or x means hexadecimal output but simply sets the hex flag on the stream.
This is also verified more specifically by the text from this link:
My colleagues and I have found, though, that when a %d descriptor is used to print a char variable the result is as though a %c descriptor had been used - printf and boost::format don't produce the same result.
The Boost documentation linked to above also explains that the zero-padding 0 modifier works on all types, not just integral ones, which is why you're getting the second 0 in 0x0^? (the ^? is a single character).
In many ways, this is similar to the problem of trying to output a const char * in C++ so that you see a pointer. The following code:
#include <iostream>
int main() {
const char *xyzzy = "plugh";
std::cout << xyzzy << '\n';
std::cout << (void*)xyzzy << '\n';
return 0;
}
will produce something like:
plugh
0x4009a0
because the standard libraries know that C-style strings are a special case but, if you tell them it's a void pointer, they'll give you a pointer as output.
A solution in your specific case may be just to cast your char to an int or some other type that intelligently handles the %x format specifier:
outFile << (boost::format("0x%02x") % static_cast<int>(memblock[0]));

A story of stringstream, hexadecimal and characters

I have a string containing hexadecimal values (two characters representing a byte). I would like to use std::stringstream to make the conversion as painless as possible, so I came up with the following code:
std::string a_hex_number = "e3";
{
unsigned char x;
std::stringstream ss;
ss << std::hex << a_hex_number;
ss >> x;
std::cout << x << std::endl;
}
To my biggest surprise this prints out "e" ... Of course I don't give up so easily, and I modify the code to be:
{
unsigned short y;
std::stringstream ss;
ss << std::hex << a_hex_number;
ss >> y;
std::cout << y << std::endl;
}
This, as expected, prints out 227 ...
I looked at http://www.cplusplus.com/reference/istream/istream/operator%3E%3E/ and http://www.cplusplus.com/reference/ios/hex/ but I just could not find a reference which tells me more about why this behaviour comes ...(yes, I feel that it is right because when extracting a character it should take one character, but I am a little bit confused that std:hex is ignored for characters). Is there a mention about this situation somewhere?
(http://ideone.com/YHt7Fz)
Edit I am specifically interested if this behaviour is mentioned in any of the STL standards.

If I understand correctly, you're trying to convert a string in
hex to an unsigned char. So for starters, since this is
"input", you should be using std::istringstream:
std::istringstream ss( a_hex_number );
ss >> std::hex >> variable;
Beyond that, you want the input to parse the string as an
integral value. Streams do not consider character types as
numeric values; they read a single character into them (after
skipping leading white space). To get a numeric value, you
should input to an int, and then convert that to unsigned
char. Characters don't have a base, so std::hex is
irrelevant for them. (The same thing holds for strings, for
example, and even for floating point.)
With regards to the page you site: the page doesn't mention
inputting into a character type (strangely enough, because it
does talk about all other types, including some very special
cases). The documentation for the std::hex manipulator is
also weak: in the running text, it only says that "extracted
values are also expected to be in hexadecimal base", which isn't
really correct; in the table, however, it clearly talks about
"integral values". In the standard, this is documented in
§27.7.2.2.3. (The >> operators for character types are not
member functions, but free functions, so are defined in
a different section.) What we are missing, however, is a good
document which synthesizes these sort of things: whether the
>> operator is a member or a free function doesn't really
affect the user much; you want to see all of the >> available,
with their semantics, in one place.

Let's put it simple: variable type is 'stronger' than 'hex'. That's why 'hex' is ignored for 'char' variable.
Longer story:
'Hex' modifies internal state of stringstream object telling it how to treat subsequent operations on integers. However, this does not apply to chars.

When you print out a character (i.e. unsigned char), it's printed as a character, not as a number.

Displaying char array in gcc does not work

I wrote a piece of code and tested with gcc compiler
#include <iostream>
int main()
{
char arr[ 1000 ];
for( int index( 0 ); index < 1000; ++index )
{
std::cout << arr[ index ] << std::endl;
}
return 0;
}
I was hoping it to print the garbage values but to my surprise, it did not print anything. When I simply changed the datatype of arr from char to int, it displayed the garbage values as expected. Could somebody please explain this to me?

The overloads for << for character types do not treat them as
integral types, but as characters. If the garbage value
corresponds to a printable character (e.g. 97, which corresponds
to 'a'), you will see it. If it doesn't (e.g. 0), you won't.
And if the garbage values correspond to some escape sequence
which causes your terminal to use a black foreground on a black background, you won't see anything else, period.
If you want to see the actual numerical values of a char (or
any character type), just convert the variable to int before
outputting it:
std::cout << static_cast<int>( arr[index] ) << std::endl;

What you're trying to do has an undefined behavior. Some compilers will clear out the memory for you, others will leave it as it was before the creation of your buffer.
Overall, this is a useless test.

Some platforms may choose, for example for security purposes, to fill the uninitialized char array with zeroes, even though it's not static and wasn't explicitly initialized.
Therefore, that is why no garbage is showing up - your char array was just automatically initialized.

On your platform garbage characters don't print. On another platform it might be different.
As an experiment try this
std::cout << '|' << arr[ index ] << '|' << std::endl;
See if anything appears between the || characters.

You're getting undefined behaviour because you're attempting to use values from an uninitialised array. You can't expect anything in particular to happen. Maybe every character happens to be a non-printing character. Maybe it just decided that it didn't want to print anything because it doesn't like your little games. Anything goes.

Union struct produces garbage and general question about struct nomenclature

I read about unions the other day( today ) and tried the sample functions that came with them. Easy enough, but the result was clear and utter garbage.
The first example is:
union Test
{
int Int;
struct
{
char byte1;
char byte2;
char byte3;
char byte4;
} Bytes;
};
where an int is assumed to have 32 bits. After I set a value Test t; t.Int = 7; and then cout
cout << t.Bytes.byte1 << etc...
the individual bytes, there is nothing displayed, but my computer beeps. Which is fairly odd I guess.
The second example gave me even worse results.
union SwitchEndian
{
unsigned short word;
struct
{
unsigned char hi;
unsigned char lo;
} data;
} Switcher;
Looks a little wonky in my opinion. Anyway, from the description it says, this should automatically store the result in a high/little endian format when I set the value like
Switcher.word = 7656; and calling with cout << Switcher.data.hi << endl
The result of this were symbols not even defined in the ASCII chart. Not sure why those are showing up.
Finally, I had an error when I tried correcting the example by, instead of placing Bytes at the end of the struct, positioning it right next to it. So instead of
struct {} Bytes;
I wanted to write
struct Bytes {};
This tossed me a big ol' error. What's the difference between these? Since C++ cannot have unnamed structs it seemed, at the time, pretty obvious that the Bytes positioned at the beginning and at the end are the things that name it. Except no, that's not the entire answer I guess. What is it then?

The beeps and weird symbols are because you are trying to print the character representations of decimal numbers, in this case, ASCII control characters. In your first example (the beeps), you are printing ASCII 7 which is the bell character.
You can cast your data to int to print out the actual decimal representation, e.g.:
cout << (int)t.Bytes.byte1 << endl << (int)t.Bytes.byte2 << endl << (int)t.Bytes.byte3 << endl << (int)t.Bytes.byte4 << endl;
You can do something similar for your second example to see the decimal representation of those unsigned char values in memory.
The reason for the difference is that the type of cout, basic_ostream, has multiple overloads for operator<< for various basic types.
For your last issue, what compiler error are you getting? Both struct definitions compile fine for me when using VS2008.

Note that, technically, reading from a member of a union other than the member that was last written to results in undefined behavior, so if you last assigned a value to Int, you cannot read a value from Bytes (there's some discussion of this on StackOverflow, for example, in this answer to another question).
Chris Schmich gives a good explanation of why you are hearing beeps and seeing control characters, so I won't repeat that.
For your final question, struct {} Bytes; declares an instance named Bytes of an unnamed struct. It is similar to saying:
struct BytesType {};
BytesType Bytes;
except that you cannot refer to BytesType elsewhere. struct Bytes {}; defines a struct named Bytes but declares no instances of it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Sign & Unsigned Char is not working in C++ - c++

Related

what does cout << "\n"[a==N]; do?

Odd behavior in boost::format hex

A story of stringstream, hexadecimal and characters

Displaying char array in gcc does not work

Union struct produces garbage and general question about struct nomenclature

Categories

Resources