Very strangely, I found that in awk, the big integer looks like has only 53 bits.Here is my example:
function bits2str(bits,data, mask)
{
if (bits == 0)
return "0"
mask = 1
for (; bits != 0; bits = rshift(bits, 1))
data = (and(bits, mask) ? "1" : "0") data
while ((length(data) % 8) != 0)
data = "0" data
return data
}
BEGIN{
print 32,"\tlshift 48:\t", lshift(32,48), "\t", bits2str(lshift(32,48))
print 429,"\tlshift 48:\t", lshift(429,48), "\t", bits2str(lshift(429,48))
}
and the output is:
32 lshift 48: 0 0
429 lshift 48: 3659174697238528 00001101000000000000000000000000000000000000000000000000
but in c++, its output is:
32 lshift 48: 9007199254740992
429 lshift 48: 120752765008871424
After comparing the two output, I found that the awk's only have 53 bits,
and then I researched the source code of gawk(start from line 3021 in the file named builtin.c, gawk 4.1.1, http://ftp.gnu.org/gnu/gawk/), but I found no special operation on int.
So, what causes this? Why it is like this?
In AWK, all numbers are stored in floating point.
From Bitwise function:
For all of these functions, first the double precision floating-point value is converted to the widest C unsigned integer type, then the bitwise operation is performed. If the result cannot be represented exactly as a C double, leading nonzero bits are removed one by one until it can be represented exactly. The result is then converted back into a C double.
Assuming IEEE-754 is used, doubles can only represent integers up to 253.
if you use gawk, you need add the -M option for big number.
kent$ awk 'BEGIN{print lshift(32,48)}'
0
kent$ awk -M 'BEGIN{print lshift(32,48)}'
9007199254740992
Related
This is an example in "A Complete Guide to Programming in C++" (Ulla Kirch-Prinz & Peter Prinz)
Example:
cout << dec << -1 << " " << hex << -1;
This statement causes the following output on a 32-bit system:
-1 ffffffff
Could anyone please explain why the second output is ffffffff?
I have trouble with the explanation in the book that says:
When octal or hexadecimal numbers are output, the bits of the number
to be output are always interpreted as unsigned! In other words, the
output shows the bit pattern of a number in octal or hexadecimal
format.
That's because most modern machines use two's complement signed integer representation.
In two's complement, the highest bit is used as a sign bit. If it is set, the number is considered negative, and to get its absolute (positive) value you need to subtract it from 2N, i.e. take it's two's complement.
If you had an 8-bit number, 00000001, it's two's complement would be 100000000-00000001 = 11111111 (or 0xFF hex). So -1 is represented as all 1's in binary form.
It's a very convenient system because you can perform arithmetic as if the numbers were unsigned (letting them overflow), then simply interpret the result as signed, and it will be correct.
compiler implement negative numbers for signed variables in ths following way, if the highest bit is true, your number implements like (VALUE_RANGE - variable)
here is an example on 8 bit numbers, i hope you will expand it.
char 0 1 2 ... 10 ... 126 127 -128 -127 -126 ... -10 ... -2 -1
uchar 0 1 2 ... 10 ... 126 127 128 129 130 ... 246 ... 254 255
hex 0 1 2 ... A ... 7E 7F 80 81 82 ... F6 ... FE FF
The text you've highlighted is saying that the output is equivalent to
cout << dec << -1 << " " << hex << (unsigned)-1;
In a 2's complement system (which any desktop PC is these days), the bit pattern for -1 has all bits set to 1.
For a 32 bit int therefore, the output will be ffffffff.
Finally, note that if int (and therefore unsigned) are 32 bits, the type of the literal 0xffffffff is unsigned.
References:
http://en.cppreference.com/w/cpp/language/integer_literal
https://en.wikipedia.org/wiki/Two%27s_complement
I saw the following line of code here in C.
int mask = ~0;
I have printed the value of mask in C and C++. It always prints -1.
So I do have some questions:
Why assigning value ~0 to the mask variable?
What is the purpose of ~0?
Can we use -1 instead of ~0?
It's a portable way to set all the binary bits in an integer to 1 bits without having to know how many bits are in the integer on the current architecture.
C and C++ allow 3 different signed integer formats: sign-magnitude, one's complement and two's complement
~0 will produce all-one bits regardless of the sign format the system uses. So it's more portable than -1
You can add the U suffix (i.e. -1U) to generate an all-one bit pattern portably1. However ~0 indicates the intention clearer: invert all the bits in the value 0 whereas -1 will show that a value of minus one is needed, not its binary representation
1 because unsigned operations are always reduced modulo the number that is one greater than the largest value that can be represented by the resulting type
That on a 2's complement platform (that is assumed) gives you -1, but writing -1 directly is forbidden by the rules (only integers 0..255, unary !, ~ and binary &, ^, |, +, << and >> are allowed).
You are studying a coding challenge with a number of restrictions on operators and language constructions to perform given tasks.
The first problem is return the value -1 without the use of the - operator.
On machines that represent negative numbers with two's complement, the value -1 is represented with all bits set to 1, so ~0 evaluates to -1:
/*
* minusOne - return a value of -1
* Legal ops: ! ~ & ^ | + << >>
* Max ops: 2
* Rating: 1
*/
int minusOne(void) {
// ~0 = 111...111 = -1
return ~0;
}
Other problems in the file are not always implemented correctly. The second problem, returning a boolean value representing the fact the an int value would fit in a 16 bit signed short has a flaw:
/*
* fitsShort - return 1 if x can be represented as a
* 16-bit, two's complement integer.
* Examples: fitsShort(33000) = 0, fitsShort(-32768) = 1
* Legal ops: ! ~ & ^ | + << >>
* Max ops: 8
* Rating: 1
*/
int fitsShort(int x) {
/*
* after left shift 16 and right shift 16, the left 16 of x is 00000..00 or 111...1111
* so after shift, if x remains the same, then it means that x can be represent as 16-bit
*/
return !(((x << 16) >> 16) ^ x);
}
Left shifting a negative value or a number whose shifted value is beyond the range of int has undefined behavior, right shifting a negative value is implementation defined, so the above solution is incorrect (although it is probably the expected solution).
Loooong ago this was how you saved memory on extremely limited equipment such as the 1K ZX 80 or ZX 81 computer. In BASIC, you would
Let X = NOT PI
rather than
LET X = 0
Since numbers were stored as 4 byte floating points, the latter takes 2 bytes more than the first NOT PI alternative, where each of NOT and PI takes up a single byte.
There are multiple ways of encoding numbers across all computer architectures. When using 2's complement this will always be true:~0 == -1. On the other hand, some computers use 1's complement for encoding negative numbers for which the above example is untrue, because ~0 == -0. Yup, 1s complement has negative zero, and that is why it is not very intuitive.
So to your questions
the ~0 is assigned to mask so all the bits in mask are equal 1 -> making mask & sth == sth
the ~0 is used to make all bits equal to 1 regardless of the platform used
you can use -1 instead of ~0 if you are sure that your computer platform uses 2's complement number encoding
My personal thought - make your code as much platform-independent as you can. The cost is relatively small and the code becomes fail proof
If we use seven-bit two's complement binary representation for integers, what is
The number of integers (things) that can be represented in this way?
The smallest (most) negative integer that can be represented in this way?
The largest positive integer that can be represented in this way?
This is a CS homework question that I am having trouble answering and explaining. Any help would be appreciated.
So its really easy
Counting in binary usually goes like
>00000000 (8) = 0
>00000001 (8) = 1
>00000010 (8) = 2
>00000011 (8) = 3
>etc...etc.
In 7 bit the last bit is what decides if its a negative or positive
1 being negative and 0 being positive
> 10000000 = -128
> 10000001 = -127
> 10000010 = -126
> On...and on...
> 11111111 = -1
> 00000000 = 0
> 00000001 = 1
> 01111111 = 127
So lowest you can go is -128
Highest you can go is 127
Approved answer is not correct.
With 7-bits of 2's complement, it could range from -64 to 63. (traditionally, 7 bits can only go up to 2^n-1 which is 128 but MSB is reserved for sign, so we could have 6 bits to represent the data.
We will be getting 64 positive and 63 negative values and answer should be -64, 63.)
No, because in two's complement, the most significant bit is the sign bit. 0000001 is +1, a positive number.
That is why the range of two's complement 7-bit numbers is -64 to 63, because 64 is not representable (it would be negative number otherwise).
The most negative number is 1000000. The leading 1 tells you it's negative, and to get the magnitude of the number, you flip all the bits (0111111), then add one (1000000 = 64). So the resulting number is -64 thru 63.
For largest positive integer in 2's complement use the formula
(2^(n-1)-1)
That is (2^(7-1)-1)=63
For the smallest use
-2^n-1
That is -2^7-1=-64
What is the difference between ~i and INT_MAX^i
Both give the same no. in binary but when we print the no. the output is different as shown in the code below
#include <bits/stdc++.h>
using namespace std;
void binary(int x)
{
int i=30;
while(i>=0)
{
if(x&(1<<i))
cout<<'1';
else
cout<<'0';
i--;
}
cout<<endl;
}
int main() {
int i=31;
int j=INT_MAX;
int k=j^i;
int g=~i;
binary(j);
binary(i);
binary(k);
binary(g);
cout<<k<<endl<<g;
return 0;
}
I get the output as
1111111111111111111111111111111
0000000000000000000000000011111
1111111111111111111111111100000
1111111111111111111111111100000
2147483616
-32
Why are k and g different?
K and g are different - the most significant bit is different. You do not display it since you show only 31 bits. In k the most significant bit is 0 (as the result of XOR of two 0's). In g it is 1 as the result of negation of 0 (the most significant bit of i).
Your test is flawed. If you output all of the integer's bits, you'll see that the values are not the same.
You'll also now see that NOT and XOR are not the same operation.
Try setting i = 31 in your binary function; it is not printing the whole number. You will then see that k and g are not the same; g has the 'negative' flag (1) on the end.
Integers use the 32nd bit to indicate if the number is positive or negative. You are only printing 31 bits.
~ is bitwise NOT; ~11100 = ~00011
^ is bitwise XOR, or true if only one or the other
~ is bitwise NOT, it will flip all the bits
Example
a: 010101
~a: 101010
^ is XOR, it means that a bit will be 1 iff one bit is 0 and the other is 1, otherwise it will set to 0.
a: 010101
b: 001100
a^b: 011001
You want UINT_MAX. And you want to use unsigned int's INT_MAX only does not have the signed bit set. ~ will flip all the bits, but ^ will leave the sign bit alone because it is not set in INT_MAX.
This statement is false:
~i and INT_MAX^i ... Both give the same no. in binary
The reason it appears that they give the same number in binary
is because you printed out only 31 of the 32 bits of each number.
You did not print the sign bit.
The sign bit of INT_MAX is 0 (indicating a positive signed integer)
and is is not changed during INT_MAX^i
because the sign bit of i also is 0,
and the XOR of two zeros is 0.
The sign bit of ~i is 1 because the sign bit of i was 0 and the
~ operation flipped it.
If you printed all 32 bits you would see this difference in the binary output.
I'm interested in learning how to convert an integer value into IEEE single precision floating point format using bitwise operators only. However, I'm confused as to what can be done to know how many logical shifts left are needed when calculating for the exponent.
Given an int, say 15, we have:
Binary: 1111
-> 1.111 x 2^3 => After placing a decimal point after the first bit, we find that the 'e' value will be three.
E = Exp - Bias
Therefore, Exp = 130 = 10000010
And the significand will be: 111000000000000000000000
However, I knew that the 'e' value would be three because I was able to see that there are three bits after placing the decimal after the first bit. Is there a more generic way to code for this as a general case?
Again, this is for an int to float conversion, assuming that the integer is non-negative, non-zero, and is not larger than the max space allowed for the mantissa.
Also, could someone explain why rounding is needed for values greater than 23 bits?
Thanks in advance!
First, a paper you should consider reading, if you want to understand floating point foibles better: "What Every Computer Scientist Should Know About Floating Point Arithmetic," http://www.validlab.com/goldberg/paper.pdf
And now to some meat.
The following code is bare bones, and attempts to produce an IEEE-754 single precision float from an unsigned int in the range 0 < value < 224. That's the format you're most likely to encounter on modern hardware, and it's the format you seem to reference in your original question.
IEEE-754 single-precision floats are divided into three fields: A single sign bit, 8 bits of exponent, and 23 bits of significand (sometimes called a mantissa). IEEE-754 uses a hidden 1 significand, meaning that the significand is actually 24 bits total. The bits are packed left to right, with the sign bit in bit 31, exponent in bits 30 .. 23, and the significand in bits 22 .. 0. The following diagram from Wikipedia illustrates:
The exponent has a bias of 127, meaning that the actual exponent associated with the floating point number is 127 less than the value stored in the exponent field. An exponent of 0 therefore would be encoded as 127.
(Note: The full Wikipedia article may be interesting to you. Ref: http://en.wikipedia.org/wiki/Single_precision_floating-point_format )
Therefore, the IEEE-754 number 0x40000000 is interpreted as follows:
Bit 31 = 0: Positive value
Bits 30 .. 23 = 0x80: Exponent = 128 - 127 = 1 (aka. 21)
Bits 22 .. 0 are all 0: Significand = 1.00000000_00000000_0000000. (Note I restored the hidden 1).
So the value is 1.0 x 21 = 2.0.
To convert an unsigned int in the limited range given above, then, to something in IEEE-754 format, you might use a function like the one below. It takes the following steps:
Aligns the leading 1 of the integer to the position of the hidden 1 in the floating point representation.
While aligning the integer, records the total number of shifts made.
Masks away the hidden 1.
Using the number of shifts made, computes the exponent and appends it to the number.
Using reinterpret_cast, converts the resulting bit-pattern to a float. This part is an ugly hack, because it uses a type-punned pointer. You could also do this by abusing a union. Some platforms provide an intrinsic operation (such as _itof) to make this reinterpretation less ugly.
There are much faster ways to do this; this one is meant to be pedagogically useful, if not super efficient:
float uint_to_float(unsigned int significand)
{
// Only support 0 < significand < 1 << 24.
if (significand == 0 || significand >= 1 << 24)
return -1.0; // or abort(); or whatever you'd like here.
int shifts = 0;
// Align the leading 1 of the significand to the hidden-1
// position. Count the number of shifts required.
while ((significand & (1 << 23)) == 0)
{
significand <<= 1;
shifts++;
}
// The number 1.0 has an exponent of 0, and would need to be
// shifted left 23 times. The number 2.0, however, has an
// exponent of 1 and needs to be shifted left only 22 times.
// Therefore, the exponent should be (23 - shifts). IEEE-754
// format requires a bias of 127, though, so the exponent field
// is given by the following expression:
unsigned int exponent = 127 + 23 - shifts;
// Now merge significand and exponent. Be sure to strip away
// the hidden 1 in the significand.
unsigned int merged = (exponent << 23) | (significand & 0x7FFFFF);
// Reinterpret as a float and return. This is an evil hack.
return *reinterpret_cast< float* >( &merged );
}
You can make this process more efficient using functions that detect the leading 1 in a number. (These sometimes go by names like clz for "count leading zeros", or norm for "normalize".)
You can also extend this to signed numbers by recording the sign, taking the absolute value of the integer, performing the steps above, and then putting the sign into bit 31 of the number.
For integers >= 224, the entire integer does not fit into the significand field of the 32-bit float format. This is why you need to "round": You lose LSBs in order to make the value fit. Thus, multiple integers will end up mapping to the same floating point pattern. The exact mapping depends on the rounding mode (round toward -Inf, round toward +Inf, round toward zero, round toward nearest even). But the fact of the matter is you can't shove 24 bits into fewer than 24 bits without some loss.
You can see this in terms of the code above. It works by aligning the leading 1 to the hidden 1 position. If a value was >= 224, the code would need to shift right, not left, and that necessarily shifts LSBs away. Rounding modes just tell you how to handle the bits shifted away.