When I type cast 433 to char I get this.
How does 433 equal to -79 while ASCII for 4 & 3 are 52 & 51 respectively, according to this table.
The decimal number 433 is 0x1b1, and is an int and is usually 32 bits longs. What happens when you cast it to a char (which usually have 8 bits) is that all but the lowest 8 bits are just thrown away, leaving you with 0xb1 which is -79 as a signed two-complement 8-bit integer.
Related
The read bits from DHT22 sensor as follow:
0000000111010001000000001110111111101111
If we calculate the check sum of them by the formula that they gave:
Reference: https://cdn-shop.adafruit.com/datasheets/Digital+humidity+and+temperature+sensor+AM2302.pdf
If you convert each of their octets in the example to decimal and compare it with the last 8 bits (checksum) - they are equal.
Binary: Decimal:
00000001 1 //First 8 bits
11010001 209 //Second 8 bits
00000000 0 //Third 8 bits
11101111 239 //Fourth 8 bits
---------------------------------->
Summed: 449
------------ Not equal ----------->
11000001 193 //Check sum
When the 16 bits for the Humidity and 16 bits for Temperature are converted they show correct results based on other popular DHT22 libraries, but the checksum is not valid.
I misunderstood the formula that they have given.
By last 8 bits they mean the last 8 bits of the 4 octets sum:
Binary: Decimal:
00000001 1 //First 8 bits
11010001 209 //Second 8 bits
00000000 0 //Third 8 bits
11101111 239 //Fourth 8 bits
---------------------------------->
Summed: 449
449 as Binary: 111000001
449's last 8 bits: 11000001
11000001 as decimal: 193
----------------------------- Equal ----------->
11000001 193 //Check sum
I hope someone who had the same problem came to that, because it took me hours of experimenting to find what was causing the problem.
In their example their sum is also lower that 255 and it is even harder to catch it, because you doesn't need to remove a bit, because the value is no bigger than 8 bits.
Recently, I have been interested with using bit shiftings on floating point numbers to do some fast calculations.
To make them work in more generic ways, I would like to make my functions work with different floating point types, probably through templates, that is not limited to float and double, but also "halfwidth" or "quadruple width" floating point numbers and so on.
Then I noticed:
- Half --- 5 exponent bits --- 10 signicant bits
- Float --- 8 exponent bits --- 23 signicant bits
- Double --- 11 exponent bits --- 52 signicant bits
So far I thought exponent bits = logbase2(total byte) * 3 + 2,
which means 128bit float should have 14 exponent bits, and 256bit float should have 17 exponent bits.
However, then I learned:
- Quad --- 15 exponent bits --- 112 signicant bits
- Octuple--- 19 exponent bits --- 237 signicant bits
So, is there a formula to find it at all? Or, is there a way to call it through some builtin functions?
C or C++ are preferred, but open to other languages.
Thanks.
Characteristics Provided Via Built-In Functions
C++ provides this information via the std::numeric_limits template:
#include <iostream>
#include <limits>
#include <cmath>
template<typename T> void ShowCharacteristics()
{
int radix = std::numeric_limits<T>::radix;
std::cout << "The floating-point radix is " << radix << ".\n";
std::cout << "There are " << std::numeric_limits<T>::digits
<< " base-" << radix << " digits in the significand.\n";
int min = std::numeric_limits<T>::min_exponent;
int max = std::numeric_limits<T>::max_exponent;
std::cout << "Exponents range from " << min << " to " << max << ".\n";
std::cout << "So there must be " << std::ceil(std::log2(max-min+1))
<< " bits in the exponent field.\n";
}
int main()
{
ShowCharacteristics<double>();
}
Sample output:
The floating-point radix is 2.
There are 53 base-2 digits in the significand.
Exponents range from -1021 to 1024.
So there must be 11 bits in the exponent field.
C also provides the information, via macro definitions like DBL_MANT_DIG defined in <float.h>, but the standard defines the names only for types float (prefix FLT), double (DBL), and long double (LDBL), so the names in a C implementation that supported additional floating-point types would not be predictable.
Note that the exponent as specified in the C and C++ standards is one off from the usual exponent described in IEEE-754: It is adjusted for a significand scaled to [½, 1) instead of [1, 2), so it is one greater than the usual IEEE-754 exponent. (The example above shows the exponent ranges from −1021 to 1024, but the IEEE-754 exponent range is −1022 to 1023.)
Formulas
IEEE-754 does provide formulas for recommended field widths, but it does not require IEEE-754 implementations to conform to these, and of course the C and C++ standards do not require C and C++ implementations to conform to IEEE-754. The interchange format parameters are specified in IEEE 754-2008 3.6, and the binary parameters are:
For a floating-point format of 16, 32, 64, or 128 bits, the significand width (including leading bit) should be 11, 24, 53, or 113 bits, and the exponent field width should be 5, 8, 11, or 15 bits.
Otherwise, for a floating-point format of k bits, k should be a multiple of 32, and the significand width should be k−round(4•log2k)+13, and the exponent field should be round(4•log2k)−13.
The answer is no.
How many bits to use (or even which representation to use) is decided by compiler implementers and committees. And there's no way to guess what a committee decided (and no, it's not the "best" solution for any reasonable definition of "best"... it's just what happened that day in that room: an historical accident).
If you really want to get down to that level you need to actually test your code on the platforms you want to deploy to and add in some #ifdef macrology (or ask the user) to find which kind of system your code is running on.
Also beware that in my experience one area in which compilers are extremely aggressive (to the point of being obnoxious) about type aliasing is with floating point numbers.
I want to see if there's a formula is to say if 512bit float is put in as standard, it would automatically work with it, without the need of altering anything
I don't know of a published standard that guarantees the bit allocation for future formats (*). Past history shows that several considerations factor into the final choice, see for example the answer and links at Why do higher-precision floating point formats have so many exponent bits?.(*) EDIT: see note added at the end.
For a guessing game, the existing 5 binary formats defined by IEEE-754 hint that the number of exponent bits grows slightly faster than linear. One (random) formula that fits these 5 data points could be for example (in WA notation) exponent_bits = round( (log2(total_bits) - 1)^(3/2) ).
This would foresee that a hypothetical binary512 format would assign 23 bits to the exponent, though of course IEEE is not bound in any way by such second-guesses.
The above is just an interpolation formula that happens to match the 5 known exponents, and it is certainly not the only such formula. For example, searching for the sequence 5,8,11,15,19 on oeis finds 18 listed integer sequences that contain this as a subsequence.
[ EDIT ] As pointed out in #EricPostpischil's answer, IEEE 754-2008 does in fact list the formula exponent_bits = round( 4 * log2(total_bits) - 13 ) for total_bits >= 128 (the formula actually holds for total_bits = 64, too, though it does not for = 32 or = 16).
The empirical formula above matches the reference IEEE one for 128 <= total_bits <= 1472, in particular IEEE also gives 23 exponent bits for binary512 and 27 exponent bits for binary1024.
UPDATE : I've now incorporated that into a single unified function that perfectly lines up with the official formula while incorporating the proper exponents for 16- and 32-bit formats, and how the bits are split between sign-bit, exponent bits, and mantissa bits.
inputs can be in # of bits, e.g. 64, a ratio like "2x", or even case-insensitive single letters :
"S" for 1x single, - "D" for 2x double,
"Q" for 4x quadruple, - "O" for 8x "octuple",
"X" for 16x he"X", - "T" for 32x "T"hirty-two,
-— all other inputs, missing, or invalid, defaults to 0.5x half-precision
gcat <( jot 20 | mawk '$!(_=NF)=(_+_)^($_)' ) \
<( jot - -1 8 | mawk '$!NF =(++_+_)^$(_--)"x"' ) |
{m,g}awk '
function _754(__,_,___) {
return \
(__=(__==___)*(_+=_+=_^=_<_) ? _--^_++ : ">"<__ ? \
(_+_)*(_*_/(_+_))^index("SDQOXT", toupper(__)) : \
__==(+__ "") ? +__ : _*int(__+__)*_)<(_+_) \
\
? "_ERR_754_INVALID_INPUT_" \
: "IEEE-754-fp:" (___=__) "-bit:" (_^(_<_)) "_s:"(__=int(\
log((_^--_)^(_+(__=(log(__)/log(--_))-_*_)/_-_)*_^(\
-((++_-__)^(--_<__) ) ) )/log(_))) "_e:" (___-++__) "_m"
}
function round(__,_) {
return \
int((++_+_)^-_+__)
}
function _4xlog2(_) {
return (log(_)/log(_+=_^=_<_))*_*_
}
BEGIN { CONVFMT = OFMT = "%.250g"
}
( $++NF = _754(_=$!__) ) \
( $++NF = "official-expn:" \
+(_=round(_4xlog2(_=_*32^(_~"[0-9.]+[Xx]")))-13) < 11 ? "n/a" :_) |
column -s':' -t | column -t | lgp3 5
.
2 _ERR_754_INVALID_INPUT_ n/a
4 _ERR_754_INVALID_INPUT_ n/a
8 IEEE-754-fp 8-bit 1_s 2_e 5_m n/a
16 IEEE-754-fp 16-bit 1_s 5_e 10_m n/a
32 IEEE-754-fp 32-bit 1_s 8_e 23_m n/a
64 IEEE-754-fp 64-bit 1_s 11_e 52_m 11
128 IEEE-754-fp 128-bit 1_s 15_e 112_m 15
256 IEEE-754-fp 256-bit 1_s 19_e 236_m 19
512 IEEE-754-fp 512-bit 1_s 23_e 488_m 23
1024 IEEE-754-fp 1024-bit 1_s 27_e 996_m 27
2048 IEEE-754-fp 2048-bit 1_s 31_e 2016_m 31
4096 IEEE-754-fp 4096-bit 1_s 35_e 4060_m 35
8192 IEEE-754-fp 8192-bit 1_s 39_e 8152_m 39
16384 IEEE-754-fp 16384-bit 1_s 43_e 16340_m 43
32768 IEEE-754-fp 32768-bit 1_s 47_e 32720_m 47
65536 IEEE-754-fp 65536-bit 1_s 51_e 65484_m 51
131072 IEEE-754-fp 131072-bit 1_s 55_e 131016_m 55
262144 IEEE-754-fp 262144-bit 1_s 59_e 262084_m 59
524288 IEEE-754-fp 524288-bit 1_s 63_e 524224_m 63
1048576 IEEE-754-fp 1048576-bit 1_s 67_e 1048508_m 67
0.5x IEEE-754-fp 16-bit 1_s 5_e 10_m n/a
1x IEEE-754-fp 32-bit 1_s 8_e 23_m n/a
2x IEEE-754-fp 64-bit 1_s 11_e 52_m 11
4x IEEE-754-fp 128-bit 1_s 15_e 112_m 15
8x IEEE-754-fp 256-bit 1_s 19_e 236_m 19
16x IEEE-754-fp 512-bit 1_s 23_e 488_m 23
32x IEEE-754-fp 1024-bit 1_s 27_e 996_m 27
64x IEEE-754-fp 2048-bit 1_s 31_e 2016_m 31
128x IEEE-754-fp 4096-bit 1_s 35_e 4060_m 35
256x IEEE-754-fp 8192-bit 1_s 39_e 8152_m 39
===============================================
Similar to the concept mentioned above, here's an alternative formula (just re-arranging some terms) that will calculate the unsigned integer range of the exponent ([32,256,2048,32768,524288], corresponding to [5,8,11,15,19]-powers-of-2) without needing to call the round function :
uint_range = ( 64 ** ( 1 + (k=log2(bits)-4)/2) )
*
( 2 ** -( (3-k)**(2<k) ) )
(a) x ** y means x-to-y-power
(b) 2 < k is a boolean condition that should just return 0 or 1.
The function shall be accurate from 16-bit to 256-bit, at least. Beyond that, this formula yields exponent sizes of
– 512-bit : 23
– 1024-bit : 27
– 2048-bit : 31
– 4096-bit : 35
(beyond-256 may be inaccurate. even 27-bit-wide exponent allows exponents that are +/- 67 million, and over 40-million decimal digits once you calculate 2-to-that-power.)
from there to IEEE 754 exponent is just a matter of log2(uint_range)
I am a trying to receive some data from network using UDP and parse it.
Here is the code,
char recvline[1024];
int n=recvfrom(sockfd,recvline,1024,0,NULL,NULL);
for(int i=0;i<n;i++)
cout << hex <<static_cast<short int>(recvline[i])<<" ";
Printed the output,
19 ffb0 0 0 ff88 d 38 19 48 38 0 0 2 1 3 1 ff8f ff82 5 40 20 16 6 6 22 36 6 2c 0 0 0 0 0 0 0 0
But I am expecting the output like,
19 b0 0 0 88 d 38 19 48 38 0 0 2 1 3 1 8f 82 5 40 20 16 6 6 22 36 6 2c 0 0 0 0 0 0 0 0
The ff shouldn't be there on printed output.
Actually I have to parse this data based on each character,
Like,
parseCommand(recvline);
and the parse code looks,
void parseCommand( char *msg){
int commId=*(msg+1);
switch(commId){
case 0xb0 : //do some operation
break;
case 0x20 : //do another operation
break;
}
}
And while debugging I am getting commId=-80 on watch.
Note:
In Linux I am getting successful output with the code, note that I have used unsigned char instead char for the read buffer.
unsigned char recvline[1024];
int n=recvfrom(sockfd,recvline,1024,0,NULL,NULL);
Where as in Windows recvfrom() not allowing the second argument as unsigned it giving build error, so I chose char
Looks like you might be getting the correct values, but your cast to short int during printing sign-extends your char value, causing ff to be propogated to the top byte if the top bit of your char is 1 (i.e. it is negative). You should first cast it to unsigned type, then extend to int, so you need 2 casts:
cout << hex << static_cast<short int>(static_cast<uint8_t>(recvline[i]))<<" ";
I have tested this and it behaves as expected.
In response to your extension: the data read is fine, it is a matter of how you interpret it. To parse correctly you should:
uint8_t commId= static_cast<uint8_t>(*(msg+1));
switch(commId){
case 0xb0 : //do some operation
break;
case 0x20 : //do another operation
break;
}
As you store your data in a signed data type conversions/promotion to bigger data types will first sign extend the value (filling the high order bits with the value of the MSB) even if it then gets converted to unsigned datatypes.
One solution is to define recvline as uint8_t[] in the first place an cast it to char* when passing it to the recvfrom function. That way, you only have to cast it once and you are using the same code in your windows and linux version. Also uint8_t[] is (at least to me) a clear indication that you are using the array as raw memory instead of a string of some kind.
Another possibility is to simply perform a bitwise And: (recvline[i] & 0xff). Thanks to automatic integral promotion this doesn't even require a cast.
Personal Note:
It is really annoying that the C and C++ standards don't provide a separate type for raw memory (yet), but with any luck well get a byte type in a future standard revision.
This question already has answers here:
What is “two's complement”?
(24 answers)
Closed 7 years ago.
I'm trying to understand why INT_MIN is equal to -2^31 - 1 and not just -2^31.
My understanding is that an int is 4 bytes = 32 bits. Of these 32 bits, I assume 1 bit is used for the +/- sign, leaving 31 bits for the actual value. As such, INT_MAX is equal to 2^31-1 = 2147483647. On the other hand, why is INT_MIN equal to -2^31 = -2147483648? Wouldn't this exceed the '4 bytes' allotted for int? Based on my logic, I would have expected INT_MIN to equal -2^31 = -2147483647
Most modern systems use two's complement to represent signed integer data types. In this representation, one state in the positive side is used up to represent zero, hence one positive value lesser than the negatives. In fact this is one of the prime advantage this system has over the sign-magnitude system, where zero has two representations, +0 and -0. Since zero has only one representation in two's complement, the other state, now free, is used to represent one more number.
Let's take a small data type, say 4 bits wide, to understand this better. The number of possible states with this toy integer type would be 2⁴ = 16 states. When using two's complement to represent signed numbers, we would have 8 negative and 7 positive numbers and zero; in sign-magnitude system, we'd get two zeros, 7 positive and 7 negative numbers.
Bin Dec
0000 = 0
0001 = 1
0010 = 2
0011 = 3
0100 = 4
0101 = 5
0110 = 6
0111 = 7
1000 = -8
1001 = -7
1010 = -6
1011 = -5
1100 = -4
1101 = -3
1110 = -2
1111 = -1
I think you are confused since you are imagining that sign-magnitude representation is used for signed numbers; although this is also allowed by the language standards, this system is very less likely to be implemented as two's complement system is significantly a better representation.
As of C++20, only two's complement is allowed for signed integers; source.
I have the following code for self learning:
#include <iostream>
using namespace std;
struct bitfields{
unsigned field1: 3;
unsigned field2: 4;
unsigned int k: 4;
};
int main(){
bitfields field;
field.field1=8;
field.field2=1e7;
field.k=18;
cout<<field.k<<endl;
cout<<field.field1<<endl;
cout<<field.field2<<endl;
return 0;
}
I know that unsigned int k:4 means that k is 4 bits wide, or a maximum value of 15, and the result is the following.
2
0
1
For example, filed1 can be from 0 to 7 (included), field2 and k from 0 to 15. Why such a result? Maybe it should be all zero?
You're overflowing your fields. Let's take k as an example, it's 4 bits wide. It can hold values, as you say, from 0 to 15, in binary representation this is
0 -> 0000
1 -> 0001
2 -> 0010
3 -> 0011
...
14 -> 1110
15 -> 1111
So when you assign 18, having binary representation
18 -> 1 0010 (space added between 4th and 5th bit for clarity)
k can only hold the lower four bits, so
k = 0010 = 2.
The equivalent holds true for the rest of your fields as well.
You have these results because the assignments overflowed each bitfield.
The variable filed1 is 3 bits, but 8 takes 4 bits to present (1000). The lower three bits are all zero, so filed1 is zero.
For filed2, 17 is represented by 10001, but filed2 is only four bits. The lower four bits represent the value 1.
Finally, for k, 18 is represented by 10010, but k is only four bits. The lower four bits represent the value 2.
I hope that helps clear things up.
In C++ any unsigned type wraps around when you hit its ceiling[1]. When you define a bitfield of 4 bits, then every value you store is wrapped around too. The possible values for a bitfield of size 4 are 0-15. If you store '17', then you wrap to '1', for '18' you go one more to '2'.
Mathematically, the wrapped value is the original value modulo the number of possible values for the destination type:
For the bitfield of size 4 (2**4 possible values):
18 % 16 == 2
17 % 16 == 1
For the bitfield of size 3 (2**3 possible values):
8 % 8 == 0.
[1] This is not true for signed types, where it is undefined what happens then.