std::cout deal with uint8_t as a character

std::cout deal with uint8_t as a character - c++

If I run this code:
std::cout << static_cast<uint8_t>(65);
It will output:
A
Which is the ASCII equivalent of the number 65.
This is because uint8_t is simply defined as:
typedef unsigned char uint8_t;
Is this behavior a standard?
Should not be a better way to define uint8_t that guaranteed to be dealt with as a number not a character?
I can not understand the logic that if I want to print the value of a uint8_t variable, it will be printed as a character.
P.S. I am using MSVS 2013.

Is this behavior a standard
The behavior is standard in that if uint8_t is a typedef of unsigned char then it will always print a character as std::ostream has an overload for unsigned char and prints out the contents of the variable as a character.
Should not be a better way to define uint8_t that guaranteed to be dealt with as a number not a character?
In order to do this the C++ committee would have had to introduce a new fundamental type. Currently the only types that has a sizeof() that is equal to 1 is char, signed char, and unsigned char. It is possible they could use a bool but bool does not have to have a size of 1 and then you are still in the same boat since
int main()
{
bool foo = 42;
std::cout << foo << '\n';
}
will print 1, not 42 as any non zero is true and true is printed as 1 but default.
I'm not saying it can't be done but it is a lot of work for something that can be handled with a cast or a function
C++17 introduces std::byte which is defined as enum class byte : unsigned char {};. So it will be one byte wide but it is not a character type. Unfortunately, since it is an enum class it comes with it's own limitations. The bit-wise operators have been defined for it but there is no built in stream operators for it so you would need to define your own to input and output it. That means you are still converting it but at least you wont conflict with the built in operators for unsigned char. That gives you something like
std::ostream& operator <<(std::ostream& os, std::byte b)
{
return os << std::to_integer<unsigned int>(b);
}
std::istream& operator <<(std::istream& is, std::byte& b)
{
unsigned int temp;
is >> temp;
b = std::byte{b};
return is;
}
int main()
{
std::byte foo{10};
std::cout << foo;
}

Posting an answer as there is some misinformation in comments.
The uint8_t may or may not be a typedef for char or unsigned char. It is also possible for it to be an extended integer type (and so, not a character type).
Compilers may offer other integer types besides the minimum set required by the standard (short, int, long, etc). For example some compilers offer a 128-bit integer type.
This would not "conflict with C" either, since C and C++ both allow for extended integer types.
So, your code has to allow for both possibilities. The suggestion in comments of using unary + would work.
Personally I think it would make more sense if the standard required uint8_t to not be a character type, as the behaviour you have noticed is unintuitive.

It's indirectly standard behavior, because ostream has an overload for unsigned char and unsigned char is a typedef for same type uint8_t in your system.
§27.7.3.1 [output.streams.ostream] gives:
template<class traits>
basic_ostream<char,traits>& operator<<(basic_ostream<char,traits>&, unsigned char);
I couldn't find anywhere in the standard that explicitly stated that uint8_t and unsigned char had to be the same, though. It's just that it's reasonable that they both occupy 1 byte in nearly all implementations.
std::cout << std::boolalpha << std::is_same<uint8_t, unsigned char>::value << std::endl; // prints true
To get the value to print as an integer, you need a type that is not unsigned char (or one of the other character overloads). Probably a simple cast to uint16_t is adequate, because the standard doesn't list an overload for it:
uint8_t a = 65;
std::cout << static_cast<uint16_t>(a) << std::endl; // prints 65
Demo

Related

std::format behaving different. between signed char / unsigned char

Is this behavior expected or as per standards (used VC compiler)?
Example 1 (signed char):
char s = 'R'
std::cout << s << std::endl; // Prints R.
std::cout << std::format("{}\n", s); // Prints R.
Example 2 (unsigned char):
unsigned char u = 'R';
std::cout << u << std::endl; // Prints R.
std::cout << std::format("{}\n", u); // Prints 82.
In the second example with std::format, u is printed as 82 instead of R, is it a bug or expected behavior?
Without using std::format, if just by std::cout, I get R in both examples.

This is intentional and specified as such in the standard.
Both char and unsigned char are fundamentally numeric types. Normally only char has the additional meaning of representing a character. For example there are no unsigned char string literals. If unsigned char is used, often aliased to std::uint8_t, then it is normally supposed to represent a numeric value (or a raw byte of memory, although std::byte is a better choice for that).
So it makes sense to choose a numeric interpretation for unsigned char and a character interpretation for char by default. In both cases that can be overwritten with {:c} as specifier for a character interpretation and {:d} for a numeric interpretation.
I think operator<<'s behavior is the non-intuitive one, but that has been around for much longer and probably can't be changed.
Also note that signed char is a completely distinct type from both char and unsigned char and that it is implementation-defined whether char is an signed or unsigned integer type (but always distinct from both signed and unsigned char).
If you used signed char it would also be interpreted as numeric by default for the same reason as unsigned char is.

In the second example std::format, its printed as 82 instead of 'R',
Is it an issue or standard?
This is behavior defined by the standard, according to [format.string.std]:
Type
Meaning
...
...
c
Copies the character static_cast<charT>(value) to the output. Throws format_error if value is not in the range of representable values for charT.
d
to_chars(first, last, value).
...
...
none
The same as d. [Note 8: If the formatting argument type is charT or bool, the default is instead c or s, respectively. — end note]
For integer types, if type options are not specified, then d will be the default. Since unsigned char is an integer type, it will be interpreted as an integer, and its value will be the value converted by std::to_chars.
(Except for charT type and bool type, the default type options are c or s)

What decides an integral type is singed or or unsigned by default?

Except for bool and the extended character types, the integral types
may be signed or unsigned (34 pp. C++ Primer 5ed)
"may be", makes me confused, however, please don't give such answer, I'm not asking the difference between, for example, int and unsigned int when you explicitly write them down in the declaration. I would like to know for type char, short, int, long, long long under what condition it is singed or unsigned
I've write a simple test code on my Mac and compiled by GNU compiler, it tells, the char is singed
#include <iostream>
#include <limits>
using namespace std;
int main( int argc, char * argv[] )
{
int minChar = numeric_limits<char>::min();
int maxChar = numeric_limits<char>::max();
cout << minChar << endl; // prints -128
cout << maxChar << endl; // prints 127
return 0;
}
The same mechanism was applied to all of the sign-able integral types, and the results are shown below.
minOfChar: -128
maxOfChar: 127
minOfShort: -32768
maxOfShort: 32767
minOfInt: -2147483648
maxOfInt: 2147483647
minOfLong: 0 // This is interesting, 0
maxOfLong: -1 // and -1 :p
minOfLongLong: 0 // shouldn't use int to hold max/min of long/long long #Bathsheba answered below
maxOfLongLong: -1 // I'll live this error unfixed, that's a stupid pitiful for newbies like me, also good for leaning :)
The result tells me, for char, short, int, long, long long which is compiled by g++ on a Mac, are singed integers by default.
So the question is as the title says:
What decides an integral type is singed or unsigned

Aside from char, the signedness of the integral types is specified in the C and C++ standards, either explicitly, or by a simple corollary of the ranges that the types are required to implement.
The signedness of char is determined by the particular implementation of C and C++; i.e. it's typically up to the compiler. And the choice will be made to best suit the hardware.
Note that char, signed char, and unsigned char are all distinct types, much in the same way that int and long are distinct types even if they have the same size and complementing scheme.
It's also not a particularly good idea to assign, for example,
numeric_limits<long>::min();
to an int value, the behaviour of this could be undefined. Why not use
auto foo = numeric_limits<whatever>::min();
instead?

Improper printing of uint8_t variable [duplicate]

This question already has answers here:
uint8_t can't be printed with cout
(8 answers)
Closed 6 years ago.
I am trying to read a small integer value (less than 10) to a uint8_t variable. I do it like this
uint8_t myID = atoi(argv[5]);
However when I do this
std::cout << "My ID is "<< myID <<std::endl;
It prints some non-alphanumeric character. There is no issue when myID is of type int. I tried casting explicitly by doing
uint8_t myID = (uint8_t)atoi(argv[5]);
But the results are the same. Could anyone explain why this is the case and if there is any possible solution?

uint8_t is not a separate data type. On systems that provide it the actual type is aliased to some standard data type, most commonly, an unsigned char.
Operator << provides an overload that takes unsigned char, and prints it as a character. When you are printing your uint8_t variable as an int, cast it to an int for printing:
std::cout << "My ID is "<< int(myID) <<std::endl;
// ^^^^^
Demo.

That's because on your platform, uint8_t is a typedef for an unsigned char.
And the ostream overloaded << for an unsigned char outputs a character, rather than a number, since the clever C++ folk thought that to be sensible. It normally is.
You can fix this by casting to an int, which will always be able to accept an uint8_t value.
(Note that prior to C++14, a char could be a 1's complement or signed magnitude 8 bit signed type, so it could be different to uint8_t).

When do we need to mention/specify the type of integer for number literals?

I came across a code like below:
#define SOME_VALUE 0xFEDCBA9876543210ULL
This SOME_VALUE is assigned to some unsigned long long later.
Questions:
Is there a need to have postfix like ULL in this case ?
What are the situation we need to specify the type of integer used ?
Do C and C++ behave differently in this case ?

In C, a hexadecimal literal gets the first type of int, unsigned int, long, unsigned long, long long or unsigned long long that can represent its value if it has no suffix. I wouldn't be surprised if C++ has the same rules.
You would need a suffix if you want to give a literal a larger type than it would have by default or if you want to force its signedness, consider for example
1 << 43;
Without suffix, that is (almost certainly) undefined behaviour, but 1LL << 43; for example would be fine.

I think not, but maybe that was required for that compiler.
For example, printf("%ld", SOME_VALUE); if SOME_VALUE's integer type is not specified, this might end up with the wrong output.

A good example for the use of specifying a suffix in C++ is overloaded functions. Take the following for example:
#include <iostream>
void consumeInt(unsigned int x)
{
std::cout << "UINT" << std::endl;
}
void consumeInt(int x)
{
std::cout << "INT" << std::endl;
}
void consumeInt(unsigned long long x)
{
std::cout << "ULL" << std::endl;
}
int main(int argc, const char * argv[])
{
consumeInt(5);
consumeInt(5U);
consumeInt(5ULL);
return 0;
}
Results in:
INT
UINT
ULL

You do not need suffixes if your only intent is to get the right value of the number; C automatically chooses a type in which the value fits.
The suffixes are important if you want to force the type of the expression, e.g. for purposes of how it interacts in expressions. Making it long, or long long, may be needed when you're going to perform an arithmetic operation that would overflow a smaller type (for example, 1ULL<<n or x*10LL), and making it unsigned is useful when you want to the expression as a whole to have unsigned semantics (for example, c-'0'<10U, or n%2U).

When you don't mention any suffix, then the type of integral literal is deduced to be int by the compiler. Since some integral literal may overflow if its type is deduced to be int, so you add suffix to tell the compiler to deduce the type to be something other than int. That is what you do when you write 0xFEDCBA9876543210ULL.
You can also use suffix when you write floating-pointer number. 1.2 is a double, while 1.2f is a float.

Cross-platform C++ determining maximim integer value (no headers)

make C++ functions or structs, classes (using meta-programming) determining maximum value for signed and unsigned type, according to compilers architecture. One for signed and second for unsigned numbers.
Requirements:
no header files
self adjusting to variable sizes (no stdint.h)
no compiler warnings about possible overflow
Clarification:
After comment's I am surprised, on reaction for non typical C++ problem. I've learned it's good to stress out, that problem is not homework and not from the moon, but it's practical domain.
For all interested in application of this stuff... first of all: it is not homework :). And it's practical, answerable question based on actual problems that I face - as in SO.FAQ is suggested . Thanks you for tips about climits etc, but I am looking for "smart piece of code". For sure climits, limits are well tested and good pieces of code, but they are huge and not necessarily "smart,tricky". We are looking here for smart solutions (not "huge-any" solutions), aren't we? Even thou, climits suggestions are ok, as start point. For those interested about area, where including header files is not allowed, and size of source code is relevant, there are few: experiments with compilers, program transformations, preparing problemsets for programming contests, etc. Actually tree of them are relevant to problems I am currently struggling. So I don't think it's (SO.FAQ)too localized, and I think, it's for sure, question for (SO.FAQ)enthusiast programmers. If you think that even all of this, there is something wrong with this question, please let me know - I don't want to make mistake again. If it's ok, please let me know, what I could do better to not get it downvoted?

Under reasonable assumptions for two's complement representation:
template<typename T> struct maxval;
template<> struct maxval<unsigned char>
{
static const unsigned char value = (unsigned char) ~0;
};
template<> struct maxval<signed char>
{
static const signed char value = ((unsigned char) ~0) >> 1;
};
template<> struct maxval<unsigned short>
{
static const unsigned short value = (unsigned short) ~0;
};
template<> struct maxval<short>
{
static const short value = ((unsigned short) ~0) >> 1;
};
int
main ()
{
std::cout << (int)maxval<signed char>::value << std::endl;
}
Likewise for the rest of the types.
Need to distinguish between signed and unsigned types when determining the max value. The easy way is to enumerate all of them like in the above example.
Perhaps it can be done with a combination of enable_if and std::is_unsigned, but reimplementing them (no headers!) will still require enumerating all types.

For unsigned types, it's simple: T(-1) will always be the maximum for that type (-1 is reduced modulo the maximum to fit in the range, always giving the maximum for the type).
For signed integer types, the job is almost as easy, at least in practice: take the maximum unsigned value, shift right one bit, and cast to signed. For C99 or C++11 that will work because only three representations for integers (1's complement, signed magnitude and 2's complement) are allowed (and it gives the correct result for all three). In theory, for C89/90 and/or C++98/03, it might be possible to design a conforming signed type for which it would fail (e.g., a biased representation where the bias was not range/2).
For those, and for floating point types (which have no unsigned counterparts), the job is rather more difficult. There's a reason these are provided in a header instead of being left for you to compute on your own...
Edit: As far as how to implement this for in C++, most of the difficulty is in specializing a template for an unsigned type. The most obvious way to do that is probably to use SFINAE, with an expression that will only be legal for a signed type (or only for an unsigned type). The usual for that would be an array whose size is something like T()-1>0. This will yield false for a signed type, which will convert to 0; since you can't create a zero-sized array, that attempted substitution will fail. For an unsigned type, the -1 will "wrap" to the maximum value, so it would create a size of 1, which is allowed.
Since this seems to be homework, I'm not going to show an actual, working implementation for that though.

This works for unsigned types:
template <typename t>
constexpr t max_val() { // constexpr c++11 thing, you can remove it for c++03
return ~(t(0));
}
signed can't be portably found as you can't assume the number of bits and encoding.

Signedness could be determined at compile-time if you wish to merge maxSigned with maxUnsigned.
#include <iostream>
#include <cstddef> // for pytdiff_t, intptr_t
template <typename T> static inline bool is_signed() {
return ~T(0) < T(1);
}
template <typename T> static inline T min_value() {
return is_signed<T>() ? ~T(0) << (sizeof(T)*8-1) : T(0);
}
template <typename T> static inline T max_value() {
return ~min_value<T>();
}
#define REPORT(type) do{ std::cout\
<< "<" #type "> is " << (is_signed<type>() ? "signed" : "unsigned")\
<< ", with lower limit " << min_value<type>()\
<< " and upper limit " << max_value<type>()\
<< std::endl; }while(false)
int main(int argc, char* argv[]) {
REPORT(char); // min, max not numeric
REPORT(int);
REPORT(unsigned);
REPORT(long long);
REPORT(unsigned long long);
REPORT(ptrdiff_t);
REPORT(size_t);
REPORT(uintptr_t);
REPORT(intptr_t);
}

Here is my answer, due to the fact, I was not expecting either: enumerating all types (as Chill), either overflow (as I stated -> I don't want compilers warnings); as some of previous answers consisted. Here is what I've found.
As Pubby has shown, unsigned case is simple:
template <class T>
T maxUnsigned(){ return ~T(0); }
As Chill mentioned :
Under reasonable assumptions for two's complement representation
Here is my meta-programming solution for signed case: (metaprogramming is for omitting overflow compiler warnings)
template<class T, int N> struct SignedMax {
const static T value = (T(1)<<N) + SignedMax<T, N - 1>::value;
};
template<class T> struct SignedMax<T, 0> {
const static T value = 1;
};
template<class T>
T maxSigned(){
return SignedMax<T, sizeof(T)*8-2>::value;
}
End example of stuff working
#include <iostream>
using std::cout;
using std::endl;
//(...)
#define PSIGNED(T) std::cout << #T "\t" << maxSigned<T>() << std::endl
int main(){
cout << maxSigned<short int>() << endl;
cout << maxSigned<int>() << endl;
cout << maxSigned<long long>() << endl;
cout << maxUnsigned<unsigned short int>() << endl;
cout << maxUnsigned<unsigned int>() << endl;
cout << maxUnsigned<unsigned long long>() << endl;
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js