If I impement BigInteger with a character array (in C++), in terms of power of 10, what is my upper bound in a 32bit system?
In other words,
- 10^x < N <= 10^x
(first character is reserved for sign).
What is x in 32 bit system?
Please ignore for now that we have reserved memory for OS and consider all 4GB memory is addressable by us.
An 8-bit byte can hold 28, or 256 unique values.
4GB of memory is 232, or 4294967296 bytes.
Or 4294967295, if we subtract the one byte that you want to reserve for a sign
That's 34359738360 bits.
This many bits can hold 234359738360 unique values.
- 10^x < N <= 10^x
(first character is reserved for sign).
What is x in 32 bit system?
Wolfram Alpha suggests - 10^1292913986 < N <= 10^1292913986 as the largest representable powers of 10.
So x is 1,292,913,986.
(−(2^(n−1))) to (2^(n−1) − 1) calculates the range of a signed integer where n is the number of bits.[1]
Assuming your referring to the whole 4GB of memory being allocated, that is 232 (4,294,967,295) addressable bytes in 32 bit memory space, which is 235 (34,359,738,368) bits.
Put that into the formula at the start and you get a range of - (2235-1) to 2235-1 -1
This is assuming you use a bit for a sign, instead of a whole byte. If your going a use a whole byte for sign, you should calculate the unsigned range of 235-8 bits. Which is from 0 to 2235-8−1
According to this page, to convert from an exponent of base 2 to an exponent of base 10, you should use the formula x = m*ln(2)/ln(10),where you are converting from 2m to 10 x.
Therefore, your answer is that the upper bound is 10235-8*ln(2)/ln(10). I'm not going to even attempt to change that exponent into a decimal value.
Related
I want to know why in formula to calculate range of any data type
i.e.2^(n-1),why it is n-1 ,n is the number of bits occupied by the given data type
Assuming that the type is unsigned, the maximum value is (2n)-1, because there are 2n values, and one of them is zero.
2(n-1) is the value of the n:th bit alone - bit 1 is 20, bit 2 is 21, and so on.
This is the same for any number base - in the decimal system, n digits can represent 10n different values, with the maximum value being 10n-1, and the n:th digit is "worth" 10(n-1).
For example, the largest number with three decimal digits is 999 (that is, 103-1), and the third decimal digit is "the hundreds digit", 102.
First 2^(n-1) is not correct, the maximum (unsigned) number represented by the data type is:
max = 2^n - 1
So for a 8 Bit data type, the maximum represented value is 255
2^n tells you the amount of numbers represented (256 for the 8-Bit example) but because you want to include 0 the range is 0 to 255 and not 1 to 256
If I have a 4 byte unsigned int, I have a space of 2^32 values.
The largest value should be 4294967296 (2^32), so why is it 4294967295 (2^32 - 1) instead?
It's just a basic math. Look at this example:
If you had a 1-bit long integer, what would the maximal value be?
According to what you say, it should be 2^1 = 2. How would you repesent a value 2 with just one bit?
Because counting starts from 0.
for 2 bit integer you can have 4 different values. (0,1,2,3) i.e 0 to 2^2 - 1.
(00,01,10,11)
Similarly for 32 bit integer you can have max value as 2^32 - 1.
2^1 occupies 2 bits.
2^32 occupies 33 bits.
But you have only 32 bits. So 2^32 - 1
Basic mathematics.
A bit can represent 2 values. A set of n bits can represent 2^n (using ^ as notation to represent "to the power of", not as a bitwise operation as is its default meaning in C++) distinct values.
A variable of unsigned integral type represents a sequence containing every consecutive integral value between 0 and the maximum value that type can represent. If the total number of sequential values is 2^n, and the first of them is zero, then the type can represent 2^n - 1 consecutive positive (non-zero) values. The maximum value must therefore be 2^n - 1.
Packets coming over a network have padding bytes added at the end for alignment. I want to skip these bytes but the packet size is variable but known. Given a number n, how do I round it up to the next 4-byte alignment?
For any integer n and any stride k (both positive), you can compute the smallest multiple of k that's not smaller than n via:
(n + k - 1) / k * k
This uses the fact that integral division truncates.
Another version. n is the number you want to alight to 4 (say k). Formula would be=n+k-n%k (where % is modulus)
For example (in Unix bc calculator)
k=4
n=551
n+k-n%k
552
to check that it is aligned:
scale=4
552/4
138.0000
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
What kind of method does the compiler use to store numbers? One char is 0-255. Two chars side by side are 0-255255. In c++ one short is 2 bytes. The size is 0-65535. Now, how does the compiler convert 255255 to 65535 and what happens with the - in the unsigned numbers?
The maximum value you can store in n bits (when the lowest value is 0 and the values represented are a continuous range), is 2ⁿ − 1. For 8 bits, this gives 255. For 16 bits, this gives 65535.
Your mistake is thinking that you can just concatenate 255 with 255 to get the maximum value in two chars - this is definitely wrong. Instead, to get from the range of 8 bits, which is 256, to the range of 16 bits, you would do 256 × 256 = 65536. Since our first value is 0, the maximum value is 65535, once again.
Note that a char is only guaranteed to have at least 8 bits and a short at least 16 bits (and must be at least as large as a char).
You have got the math totally wrong. Here's how it really is
since each bit can only take on either of two states only(1 and 0) , n bits as a whole can represents 2^n different quantities not numbers. When dealing with integers a standard short integer size of 2 bytes can represent 2^n - 1 (n=16 so 65535)which are mapped to decimal numbers in real life for compuational simplicity.
When dealing with 2 character they are two seperate entities (string is an array of characters). There are many ways to read the two characters on a whole, if you read is at as a string then it would be same as two seperate characters side by side. let me give you an example :
remember i will be using hexadecimal notation for simplicity!
if you have doubt mapping ASCII characters to hex take a look at this ascii to hex
for simplicity let us assume the characters stored in two adjacent positions are both A.
Now hex code for A is 0x41 so the memory would look like
1 byte ....... 2nd byte
01000100 01000001
if you were to read this from the memory as a string and print it out then the output would be
AA
if you were to read the whole 2 bytes as an integer then this would represent
0 * 2^15 + 1 * 2^14 + 0 * 2^13 + 0 * 2^12 + 0 * 2^11 + 1 * 2^10 + 0 * 2^9 + 0 * 2^8 + 0 * 2^7 + 1 * 2^6 + 0 * 2^5 + 0 * 2^4 + 0 * 2^3 + 0 * 2^2 + 0 * 2^1 + 1 * 2^0
= 17537
if unsigned integers were used then the 2 bytes of data would me mapped to integers between
0 and 65535 but if the same represented a signed value then then , though the range remains the same the biggest positive number that can be represented would be 32767. the values would lie between -32768 and 32767 this is because all of the 2 bytes cannot be used and the highest order bit is left to determine the sign. 1 represents negative and 2 represents positive.
You must also note that type conversion (two characters read as single integer) might not always give you the desired results , especially when you narrow the range. (example a doble precision float is converted to an integer.)
For more on that see this answer double to int
hope this helps.
When using decimal system it's true that range of one digit is 0-9 and range of two digits is 0-99. When using hexadecimal system the same thing applies, but you have to do the math in hexadecimal system. Range of one hexadecimal digit is 0-Fh, and range of two hexadecimal digits (one byte) is 0-FFh. Range of two bytes is 0-FFFFh, and this translates to 0-65535 in decimal system.
Decimal is a base-10 number system. This means that each successive digit from right-to-left represents an incremental power of 10. For example, 123 is 3 + (2*10) + (1*100). You may not think of it in these terms in day-to-day life, but that's how it works.
Now you take the same concept from decimal (base-10) to binary (base-2) and now each successive digit is a power of 2, rather than 10. So 1100 is 0 + (0*2) + (1*4) + (1*8).
Now let's take an 8-bit number (char); there are 8 digits in this number so the maximum value is 255 (2**8 - 1), or another way, 11111111 == 1 + (1*2) + (1*4) + (1*8) + (1*16) + (1*32) + (1*64) + (1*128).
When there are another 8 bits available to make a 16-bit value, you just continue counting powers of 2; you don't just "stick" the two 255s together to make 255255. So the maximum value is 65535, or another way, 1111111111111111 == 1 + (1*2) + (1*4) + (1*8) + (1*16) + (1*32) + (1*64) + (1*128) + (1*256) + (1*512) + (1*1024) + (1*2048) + (1*4096) + (1*8192) + (1*16384) + (1*32768).
It depends on the type: integral types must be stored as binary
(or at least, appear so to a C++ program), so you have one bit
per binary digit. With very few exceptions, all of the bits are
significant (although this is not required, and there is at
least one machine where there are extra bits in an int). On
a typical machine, char will be 8 bits, and if it isn't
signed, can store values in the range [0,2^8); in other words,
between 0 and 255 inclusive. unsigned short will be 16 bits
(range [0,2^16)), unsigned int 32 bits (range [0,2^32))
and unsigned long either 32 or 64 bits.
For the signed values, you'll have to use at least one of the
bits for the sign, reducing the maximum positive value. The
exact representation of negative values can vary, but in most
machines, it will be 2's complement, so the ranges will be
signed char:[-2^7,2^7-1)`, etc.
If you're not familiar with base two, I'd suggest you learn it
very quickly; it's fundamental to how all modern machines store
numbers. You should find out about 2's complement as well: the
usual human representation is called sign plus magnitude, and is
very rare in computers.
I'm interested in learning how to convert an integer value into IEEE single precision floating point format using bitwise operators only. However, I'm confused as to what can be done to know how many logical shifts left are needed when calculating for the exponent.
Given an int, say 15, we have:
Binary: 1111
-> 1.111 x 2^3 => After placing a decimal point after the first bit, we find that the 'e' value will be three.
E = Exp - Bias
Therefore, Exp = 130 = 10000010
And the significand will be: 111000000000000000000000
However, I knew that the 'e' value would be three because I was able to see that there are three bits after placing the decimal after the first bit. Is there a more generic way to code for this as a general case?
Again, this is for an int to float conversion, assuming that the integer is non-negative, non-zero, and is not larger than the max space allowed for the mantissa.
Also, could someone explain why rounding is needed for values greater than 23 bits?
Thanks in advance!
First, a paper you should consider reading, if you want to understand floating point foibles better: "What Every Computer Scientist Should Know About Floating Point Arithmetic," http://www.validlab.com/goldberg/paper.pdf
And now to some meat.
The following code is bare bones, and attempts to produce an IEEE-754 single precision float from an unsigned int in the range 0 < value < 224. That's the format you're most likely to encounter on modern hardware, and it's the format you seem to reference in your original question.
IEEE-754 single-precision floats are divided into three fields: A single sign bit, 8 bits of exponent, and 23 bits of significand (sometimes called a mantissa). IEEE-754 uses a hidden 1 significand, meaning that the significand is actually 24 bits total. The bits are packed left to right, with the sign bit in bit 31, exponent in bits 30 .. 23, and the significand in bits 22 .. 0. The following diagram from Wikipedia illustrates:
The exponent has a bias of 127, meaning that the actual exponent associated with the floating point number is 127 less than the value stored in the exponent field. An exponent of 0 therefore would be encoded as 127.
(Note: The full Wikipedia article may be interesting to you. Ref: http://en.wikipedia.org/wiki/Single_precision_floating-point_format )
Therefore, the IEEE-754 number 0x40000000 is interpreted as follows:
Bit 31 = 0: Positive value
Bits 30 .. 23 = 0x80: Exponent = 128 - 127 = 1 (aka. 21)
Bits 22 .. 0 are all 0: Significand = 1.00000000_00000000_0000000. (Note I restored the hidden 1).
So the value is 1.0 x 21 = 2.0.
To convert an unsigned int in the limited range given above, then, to something in IEEE-754 format, you might use a function like the one below. It takes the following steps:
Aligns the leading 1 of the integer to the position of the hidden 1 in the floating point representation.
While aligning the integer, records the total number of shifts made.
Masks away the hidden 1.
Using the number of shifts made, computes the exponent and appends it to the number.
Using reinterpret_cast, converts the resulting bit-pattern to a float. This part is an ugly hack, because it uses a type-punned pointer. You could also do this by abusing a union. Some platforms provide an intrinsic operation (such as _itof) to make this reinterpretation less ugly.
There are much faster ways to do this; this one is meant to be pedagogically useful, if not super efficient:
float uint_to_float(unsigned int significand)
{
// Only support 0 < significand < 1 << 24.
if (significand == 0 || significand >= 1 << 24)
return -1.0; // or abort(); or whatever you'd like here.
int shifts = 0;
// Align the leading 1 of the significand to the hidden-1
// position. Count the number of shifts required.
while ((significand & (1 << 23)) == 0)
{
significand <<= 1;
shifts++;
}
// The number 1.0 has an exponent of 0, and would need to be
// shifted left 23 times. The number 2.0, however, has an
// exponent of 1 and needs to be shifted left only 22 times.
// Therefore, the exponent should be (23 - shifts). IEEE-754
// format requires a bias of 127, though, so the exponent field
// is given by the following expression:
unsigned int exponent = 127 + 23 - shifts;
// Now merge significand and exponent. Be sure to strip away
// the hidden 1 in the significand.
unsigned int merged = (exponent << 23) | (significand & 0x7FFFFF);
// Reinterpret as a float and return. This is an evil hack.
return *reinterpret_cast< float* >( &merged );
}
You can make this process more efficient using functions that detect the leading 1 in a number. (These sometimes go by names like clz for "count leading zeros", or norm for "normalize".)
You can also extend this to signed numbers by recording the sign, taking the absolute value of the integer, performing the steps above, and then putting the sign into bit 31 of the number.
For integers >= 224, the entire integer does not fit into the significand field of the 32-bit float format. This is why you need to "round": You lose LSBs in order to make the value fit. Thus, multiple integers will end up mapping to the same floating point pattern. The exact mapping depends on the rounding mode (round toward -Inf, round toward +Inf, round toward zero, round toward nearest even). But the fact of the matter is you can't shove 24 bits into fewer than 24 bits without some loss.
You can see this in terms of the code above. It works by aligning the leading 1 to the hidden 1 position. If a value was >= 224, the code would need to shift right, not left, and that necessarily shifts LSBs away. Rounding modes just tell you how to handle the bits shifted away.