Get number of bits in char - c++

How do I get the number of bits in type char?
I know about CHAR_BIT from climits. This is described as »The macro yields the maximum value for the number of bits used to represent an object of type char.« at Dikumware's C Reference. I understand that means the number of bits in a char, doesn't it?
Can I get the same result with std::numeric_limits somehow? std::numeric_limits<char>::digits returns 7 correctly but unfortunately, because this value respects the signedness of the 8-bit char here…

CHAR_BIT is, by definition, number of bits in the object representation of type [signed/unsigned] char.
numeric_limits<>::digits is the number of non-sign bits in the value representation of the given type.
Which one do you need?
If you are looking for number of bits in the object representation, then the correct approach is to take the sizeof of the type and multiply it by CHAR_BIT (of course, there's no point in multiplying by sizeof in specific case of char types, since their size is always 1, and since CHAR_BIT by definition alredy contains what you need).
If you are talking about value representation then numeric_limits<> is the way to go.
For unsigned char type the bit-size of object representation (CHAR_BIT) is guaranteed to be the same as bit-size of value representation, so you can use numeric_limits<unsigned char>::digits and CHAR_BIT interchangeably, but this might be questionable from the conceptual point of view.

If you want to be overly specific, you can do this:
sizeof(char) * CHAR_BIT
If you know you are definitely going to do the sizeof char, it's a bit overkill as sizeof(char) is guaranteed to be 1.
But if you move to a different type such as wchar_t, that will be important.

Looking at the snippets archive for this code here, here's an adapted version, I do not claim this code:
int countbits(char ch){
int n = 0;
if (ch){
do n++;
while (0 != (ch = ch&(ch-1)));
}
return n;
}
Hope this helps,
Best regards,
Tom.

A non-efficient way:
char c;
int bits;
for ( c = 1, bits = 0; c; c <<= 1, bits++ )
;
printf( "bits = %d\n", bits );

no reputation here yet so I'm not allowed to reply to #t0mm13b 's answer, but wanted to point out that there's a problem with the code:
int countbits(char ch){
int n = 0;
if (ch){
do n++;
while (0 != (ch = ch&(ch-1)));
}
return n;
}
The above won't count the number of bits in a character, it will count the number of set bits (1 bits).
For example, the following call will return 4:
char c = 'U';
countbits(c);
The code:
ch = ch & (ch - 1)
Is a trick to strip off the right most (least significant) bit that's set to 1. So, it glosses over any bits set to 0 and doesn't count them.

Related

Does an integer smaller than 255 perform as fast as an unsigned character?

I know that an unsigned character's size is 8 bits, and the size of an integer is 32 bits.
But I want to know if I perform an operation between two integers below 255, is it safe to say it is as fast as performing the same operation on two unsigned characters of the same value of that integers?
Example:
int Int2 = 0x10;
int Int1 = 0xff;
unsigned char Char0 = 0x10;
unsigned char Char1 = 0xff;
Int1 + Int2 ; // Is calculating this
Char0 + Char1; // Faster than this??
Update:
Let's put this in the context as someone suggested
for (unsigned char c=0;c!=256;c++){ // does this loop
std::cout<<c; // dont mind this line it can be any statement
}
for (int i=0;i!=256;i++){ // perform faster than this one??
std::cout<<i;// this too
}
I know that an unsigned character's size is 8 bits
This is not necessarily always the case in C++. But it may be true in particular implementation of C++.
and the size of an integer is 32 bits.
There are several integer types in C++. In fact, character types are integer types as well.
Int1 + Int2 ; // Is calculating this
Char0 + Char1; // Faster than this??
Integers of lower rank than int are promoted to int (or unsigned int in rare cases) when used as operand of most binary operators. Both operators in the example operate on int after the promotion. You don't use the result at all, so there's no need for the compiler to produce any code, so they should be equally fast in this trivial example.
Whether one piece of code is faster than the other depends on many factors. It's not possible to accurately guess which way it would go without context.

why declare "score[11] = {};" and "grade" as "unsigned" instead of "int'

I'm new to C++ and is trying to learn the concept of array. I saw this code snippet online. For the sample code below, does it make any difference to declare:
unsigned scores[11] = {};
unsigned grade;
as:
int scores[11] = {};
int grade;
I guess there must be a reason why score[11] = {}; and grade is declared as unsigned, but what is the reason behind it?
int main() {
unsigned scores[11] = {};
unsigned grade;
while (cin >> grade) {
if (0 <= grade <= 100) {
++scores[grade / 10];
}
}
for (int i = 0; i < 11; i++) {
cout << scores[i] << endl;
}
}
unsigned means that the variable will not hold a negative values (or even more accurate - It will not care about the sign-). It seems obvious that scores and grades are signless values (no one scores -25). So, it is natural to use unsigned.
But note that: if (0 <= grade <= 100) is redundant. if (grade <= 100) is enough since no negative values are allowed.
As Blastfurnace commented, if (0 <= grade <= 100) is not right even. if you want it like this you should write it as:
if (0 <= grade && grade <= 100)
Unsigned variables
Declaring a variable as unsigned int instead of int has 2 consequences:
It can't be negative. It provides you a guarantee that it never will be and therefore you don't need to check for it and handle special cases when writing code that only works with positive integers
As you have a limited size, it allows you to represent bigger numbers. On 32 bits, the biggest unsigned int is 4294967295 (2^32-1) whereas the biggest int is 2147483647 (2^31-1)
One consequence of using unsigned int is that arithmetic will be done in the set of unsigned int. So 9 - 10 = 4294967295 instead of -1 as no negative number can be encoded on unsigned int type. You will also have issues if you compare them to negative int.
More info on how negative integer are encoded.
Array initialization
For the array definition, if you just write:
unsigned int scores[11];
Then you have 11 uninitialized unsigned int that have potentially values different than 0.
If you write:
unsigned int scores[11] = {};
Then all int are initialized with their default value that is 0.
Note that if you write:
unsigned int scores[11] = { 1, 2 };
You will have the first int intialized to 1, the second to 2 and all the others to 0.
You can easily play a little bit with all these syntax to gain a better understanding of it.
Comparison
About the code:
if(0 <= grade <= 100)
as stated in the comments, this does not do what you expect. In fact, this will always evaluate to true and therefore execute the code in the if. Which means if you enter a grade of, say, 20000, you should have a core dump. The reason is that this:
0 <= grade <= 100
is equivalent to:
(0 <= grade) <= 100
And the first part is either true (implicitly converted to 1) or false (implicitly converted to 0). As both values are lower than 100, the second comparison is always true.
unsigned integers have some strange properties and you should avoid them unless you have a good reason. Gaining 1 extra bit of positive size, or expressing a constraint that a value may not be negative, are not good reasons.
unsigned integers implement arithmetic modulo UINT_MAX+1. By contrast, operations on signed integers represent the natural arithmetic that we are familiar with from school.
Overflow semantics
unsigned has well defined overflow; signed does not:
unsigned u = UINT_MAX;
u++; // u becomes 0
int i = INT_MAX;
i++; // undefined behaviour
This has the consequence that signed integer overflow can be caught during testing, while an unsigned overflow may silently do the wrong thing. So use unsigned only if you are sure you want to legalize overflow.
If you have a constraint that a value may not be negative, then you need a way to detect and reject negative values; int is perfect for this. An unsigned will accept a negative value and silently overflow it into a positive value.
Bit shift semantics
Bit shift of unsigned by an amount not greater than the number of bits in the data type is always well defined. Until C++20, bit shift of signed was undefined if it would cause a 1 in the sign bit to be shifted left, or implementation-defined if it would cause a 1 in the sign bit to be shifted right. Since C++20, signed right shift always preserves the sign, but signed left shift does not. So use unsigned for some kinds of bit twiddling operations.
Mixed sign operations
The built-in arithmetic operations always operate on operands of the same type. If they are supplied operands of different types, the "usual arithmetic conversions" coerce them into the same type, sometimes with surprising results:
unsigned u = 42;
std::cout << (u * -1); // 4294967254
std::cout << std::boolalpha << (u >= -1); // false
What's the difference?
Subtracting an unsigned from another unsigned yields an unsigned result, which means that the difference between 2 and 1 is 4294967295.
Double the max value
int uses one bit to represent the sign of the value. unsigned uses this bit as just another numerical bit. So typically, int has 31 numerical bits and unsigned has 32. This extra bit is often cited as a reason to use unsigned. But if 31 bits are insufficient for a particular purpose, then most likely 32 bits will also be insufficient, and you should be considering 64 bits or more.
Function overloading
The implicit conversion from int to unsigned has the same rank as the conversion from int to double, so the following example is ill formed:
void f(unsigned);
void f(double);
f(42); // error: ambiguous call to overloaded function
Interoperability
Many APIs (including the standard library) use unsigned types, often for misguided reasons. It is sensible to use unsigned to avoid mixed-sign operations when interacting with these APIs.
Appendix
The quoted snippet includes the expression 0 <= grade <= 100. This will first evaluate 0 <= grade, which is always true, because grade can't be negative. Then it will evaluate true <= 100, which is always true, because true is converted to the integer 1, and 1 <= 100 is true.
Yes it does make a difference. In the first case you declare an array of 11 elements a variable of type "unsigned int". In the second case you declare them as ints.
When the int is on 32 bits you can have values from the following ranges
–2,147,483,648 to 2,147,483,647 for plain int
0 to 4,294,967,295 for unsigned int
You normally declare something unsigned when you don't need negative numbers and you need that extra range given by unsigned. In your case I assume that that by declaring the variables unsigned, the developer doesn't accept negative scores and grades. You basically do a statistic of how many grades between 0 and 10 were introduced at the command line. So it looks like something to simulate a school grading system, therefore you don't have negative grades. But this is my opinion after reading the code.
Take a look at this post which explains what unsigned is:
what is the unsigned datatype?
As the name suggests, signed integers can be negative and unsigned cannot be. If we represent an integer with N bits then for unsigned the minimum value is 0 and the maximum value is 2^(N-1). If it is a signed integer of N bits then it can take the values from -2^(N-2) to 2^(N-2)-1. This is because we need 1-bit to represent the sign +/-
Ex: signed 3-bit integer (yes there are such things)
000 = 0
001 = 1
010 = 2
011 = 3
100 = -4
101 = -3
110 = -2
111 = -1
But, for unsigned it just represents the values [0,7]. The most significant bit (MSB) in the example signifies a negative value. That is, all values where the MSB is set are negative. Hence the apparent loss of a bit in its absolute values.
It also behaves as one might expect. If you increment -1 (111) we get (1 000) but since we don't have a fourth bit it simply "falls off the end" and we are left with 000.
The same applies to subtracting 1 from 0. First take the two's complement
111 = twos_complement(001)
and add it to 000 which yields 111 = -1 (from the table) which is what one might expect. What happens when you increment 011(=3) yielding 100(=-4) is perhaps not what one might expect and is at odds with our normal expectations. These overflows are troublesome with fixed point arithmetic and have to be dealt with.
One other thing worth pointing out is the a signed integer can take one negative value more than it can positive which has a consequence for rounding (when using integer to represent fixed point numbers for example) but am sure that's better covered in the DSP or signal processing forums.

Shift with 64 bit int

I have a __int64 variable x = 0x8000000000000000.
I try to shift it right by byte : x >> 4
I`ve thought that the result should be 0x0800000000000000, but unfortunately I get 0xf800000000000000.
I use VS10. Why is it so? And how can I solve that?
try to use __uint64 variable x = 0x8000000000000000
I think you can declare it this way as well:
u64 x = 0x8000000000000000;
x >> 4 you will give you:
0x0800000000000000
see for more info where the F came from in the MSBs.
The reason is because shifting signed numbers is only defined by the language if the left operand is at least 0. In your case I assume it's a twos-complement representation and your number is negative making the result unspecified (or implementation-defined, I don't have the reference at hand right now). Typically you would either get a logical shift or an arithmetic shift.
If you can get away with making your variable unsigned that would solve your problem.
Well, that is an easy one. :)
Your number is signed. With the first bit being set, it is negative. Thus, in order to keep it negative, it is filled with 1s.
I would assume casting it to unsigned and then shifting it should fix your problem.
This is because you have a negative number. Use (or convert to) unsigned before the shift.
Signed values are shifted right by sign bit and unsigned by zero:
int _tmain(int argc, _TCHAR* argv[])
{
__int64 signedval;
unsigned __int64 unsignedval;
signedval=0x8000000000000000 >> 4;
unsignedval=0x8000000000000000 >> 4;
printf(" Signed %x , unsigned %x \n", signedval, unsignedval);
getchar();
return 0;
}
and the output is:
Signed 0 , unsigned 8000000

Checksum calculation - two’s complement sum of all bytes

I have instructions on creating a checksum of a message described like this:
The checksum consists of a single byte equal to the two’s complement sum of all bytes starting from the “message type” word up to the end of the message block (excluding the transmitted checksum). Carry from the most significant bit is ignored.
Another description I found was:
The checksum value contains the twos complement of the modulo 256 sum of the other words in the data message (i.e., message type, message length, and data words). The receiving equipment may calculate the modulo 256 sum of the received words and add this sum to the received checksum word. A result of zero generally indicates that the message was correctly received.
I understand this to mean that I sum the value of all bytes in message (excl checksum), get modulo 256 of this number. get twos complement of this number and that is my checksum.
But I am having trouble with an example message example (from design doc so I must assume it has been encoded correctly).
unsigned char arr[] = {0x80,0x15,0x1,0x8,0x30,0x33,0x31,0x35,0x31,0x30,0x33,0x30,0x2,0x8,0x30,0x33,0x35,0x31,0x2d,0x33,0x32,0x31,0x30,0xe};
So the last byte, 0xE, is the checksum. My code to calculate the checksum is as follows:
bool isMsgValid(unsigned char arr[], int len) {
int sum = 0;
for(int i = 0; i < (len-1); ++i) {
sum += arr[i];
}
//modulo 256 sum
sum %= 256;
char ch = sum;
//twos complement
unsigned char twoscompl = ~ch + 1;
return arr[len-1] == twoscompl;
}
int main(int argc, char* argv[])
{
unsigned char arr[] = {0x80,0x15,0x1,0x8,0x30,0x33,0x31,0x35,0x31,0x30,0x33,0x30,0x2,0x8,0x30,0x33,0x35,0x31,0x2d,0x33,0x32,0x31,0x30,0xe};
int arrsize = sizeof(arr) / sizeof(arr[0]);
bool ret = isMsgValid(arr, arrsize);
return 0;
}
The spec is here:= http://www.sinet.bt.com/227v3p5.pdf
I assume I have misunderstood the algorithm required. Any idea how to create this checksum?
Flippin spec writer made a mistake in their data example. Just spotted this then came back on here and found others spotted too. Sorry if I wasted your time. I will study responses because it looks like some useful comments for improving my code.
You miscopied the example message from the pdf you linked. The second parameter length is 9 bytes, but you used 0x08 in your code.
The document incorrectly states "8 bytes" in the third column when there are really 9 bytes in the parameter. The second column correctly states "00001001".
In other words, your test message should be:
{0x80,0x15,0x1,0x8,0x30,0x33,0x31,0x35,0x31,0x30,0x33,0x30, // param1
0x2,0x9,0x30,0x33,0x35,0x31,0x2d,0x33,0x32,0x31,0x30,0xe} // param2
^^^
With the correct message array, ret == true when I try your program.
Agree with the comment: looks like the checksum is wrong. Where in the .PDF is this data?
Some general tips:
Use an unsigned type as the accumulator; that gives you well-defined behavior on overflow, and you'll need that for longer messages. Similarly, if you store the result in a char variable, make it unsigned char.
But you don't need to store it; just do the math with an unsigned type, complement the result, add 1, and mask off the high bits so that you get an 8-bit result.
Also, there's a trick here, if you're on hardware that uses twos-complement arithmetic: just add all of the values, including the checksum, then mask off the high bits; the result will be 0 if the input was correct.
The receiving equipment may calculate the modulo 256 sum of the received words and add this sum to the received checksum word.
It's far easier to use this condition to understand the checksum:
{byte 0} + {byte 1} + ... + {last byte} + {checksum} = 0 mod 256
{checksum} = -( {byte 0} + {byte 1} + ... + {last byte} ) mod 256
As the others have said, you really should use unsigned types when working with individual bits. This is also true when doing modular arithmetic. If you use signed types, you leave yourself open to a rather large number of sign-related mistakes. OTOH, pretty much the only mistake you open yourself up to using unsigned numbers is things like forgetting 2u-3u is a positive number.
(do be careful about mixing signed and unsigned numbers together: there are a lot of subtleties involved in that too)

Is it legal to use char overflow in C++ code

Good day, colleagues!
I need to obtain cyclic series on successive numbers from 0 to 255. Is it legal to use unsigned char overflow like this:
unsigned char test_char = 0;
while (true) {
std::cout << test_char++ << " ";
}
Or will be more safely to use this code:
int test_int = 0;
while (true) {
std::cout << test_int++ % 256 << " ";
}
Of course, in real code there will be reasonable condition instead of while (true).
3.9.1/4 "Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer"
"This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type"
So, yes it is legal. And the second form is preferred, since it's more readable.
Even though sizeof(char) will always be 1, it is not necessary that a char will be exactly 8 bits. (I am guessing unsigned char will be similar).
So of the two, if given a choice, I would prefer the latter as the former might not even be correct.
btw, You probably intended unsigned int instead of int for the latter? Modulus with negative numbers could get tricky (after the int overflows, as Jimmy noted). If I recollect correctly, I believe it is compiler dependent.
unsigned char, like all other unsigned integral types, follows modulo 2n arithmetic, so basically both your methods are equivalent. Use the first
There is no such thing as unsigned overflow, per 3.9.1/4 as quoted by Erik. However, as Moron says, it is possible that the modulus of the unsigned char number system is greater than 256.
Note that your expression does not store the result of % 256 back to test_int. The safe way to do this is
test_int = ( test_int + 1 ) % 256;
std::cout << test_int << " ";
the output of the 2 samples is completely different.
the first one will print characters (a b c d e f g h ...)
the second one will print integers (0 1 2 3 4 ... 255 0 ...)
anyway it depends if you have a a control for overflowexception (.NET) otherwise in old C++ it the value is always valid and goes from 0 to 255