Why do negative numbers in variables have a larger limit than a positive number (signed variables)? - c++

As seen in the picture above all of the variables have a negative limit that is one more than the positive limit. I was how it is able to add that extra one. I know that the first digit in the variable is used to tell if it is negative (1) or if is not (0). I also know that binary is based on the powers of 2. What I am confused about is how there is one extra when the positive itself can't go higher and the negative only has one digit changing. For example, a short can go up to 32,767 (01111111 11111111) or 16,383 + all of the decimal values of the binary numbers below it. Negative numbers are the same thing except a one at the beginning, right? So how do the negative numbers have a larger limit? Thanks to anyone who answers!

The reason is a scheme called "2's complement" to represent signed integer.
You know that the most significant bit of a signed integer represent the sign. But what you don't know is, it also represent a value, a negative value.
Take a 4-bit 2's complement signed integer as an example:
1 0 1 0
-2^3 2^2 2^1 2^0
This 4-bit integer is interpreted as:
1 * -2^3 + 0 * 2^2 + 1 * 2^1 + 0 * 2^0
= -8 + 0 + 2 + 0
= -6
With this scheme, the max of 4-bit 2's complement is 7.
0 1 1 1
-2^3 2^2 2^1 2^0
And the min is -8.
1 0 0 0
-2^3 2^2 2^1 2^0
Also, 0 is represented by 0000, 1 is 0001, and -1 is 1111. Comparing these three numbers, we can observe that zero has its "sign bit" positive, and there is no "negative zero" in 2's complement scheme. In other words, half of the range only consists of negative number, but the other half of the range includes zero and positive numbers.

If integers are stored using two's complement then you get one extra negative value and a single zero. If they are stored using one's complement or signed magnitude you get two zeros and the same number of negative values as positive ones. Floating point numbers have their own storage scheme, and under IEEE formats use have an explicit sign bit.

I know that the first digit in the variable is used to tell if it is negative (1) or if is not (0).
The first binary digit (or bit), yes, assuming two's complement representation. Which basically answers your question. There are 32,768 numbers < 0 (-32,768 .. -1) , and 32,768 numbers >= 0 (0 .. +32,767).
Also note that in binary the total possible representations (bit patterns) are an even number. You couldn't have the min and max values equal in absolute values, since you'd end up with an odd number of possible values (counting 0). Thus, you'd have to waste or declare illegal at least one bit pattern.

Related

How to calculate range of data type eg int

I want to know why in formula to calculate range of any data type
i.e.2^(n-1),why it is n-1 ,n is the number of bits occupied by the given data type
Assuming that the type is unsigned, the maximum value is (2n)-1, because there are 2n values, and one of them is zero.
2(n-1) is the value of the n:th bit alone - bit 1 is 20, bit 2 is 21, and so on.
This is the same for any number base - in the decimal system, n digits can represent 10n different values, with the maximum value being 10n-1, and the n:th digit is "worth" 10(n-1).
For example, the largest number with three decimal digits is 999 (that is, 103-1), and the third decimal digit is "the hundreds digit", 102.
First 2^(n-1) is not correct, the maximum (unsigned) number represented by the data type is:
max = 2^n - 1
So for a 8 Bit data type, the maximum represented value is 255
2^n tells you the amount of numbers represented (256 for the 8-Bit example) but because you want to include 0 the range is 0 to 255 and not 1 to 256

Why is the max value of an unsigned int -> 2^n - 1 in C++

If I have a 4 byte unsigned int, I have a space of 2^32 values.
The largest value should be 4294967296 (2^32), so why is it 4294967295 (2^32 - 1) instead?
It's just a basic math. Look at this example:
If you had a 1-bit long integer, what would the maximal value be?
According to what you say, it should be 2^1 = 2. How would you repesent a value 2 with just one bit?
Because counting starts from 0.
for 2 bit integer you can have 4 different values. (0,1,2,3) i.e 0 to 2^2 - 1.
(00,01,10,11)
Similarly for 32 bit integer you can have max value as 2^32 - 1.
2^1 occupies 2 bits.
2^32 occupies 33 bits.
But you have only 32 bits. So 2^32 - 1
Basic mathematics.
A bit can represent 2 values. A set of n bits can represent 2^n (using ^ as notation to represent "to the power of", not as a bitwise operation as is its default meaning in C++) distinct values.
A variable of unsigned integral type represents a sequence containing every consecutive integral value between 0 and the maximum value that type can represent. If the total number of sequential values is 2^n, and the first of them is zero, then the type can represent 2^n - 1 consecutive positive (non-zero) values. The maximum value must therefore be 2^n - 1.

Seven bit and two compliment

If we use seven-bit two's complement binary representation for integers, what is
The number of integers (things) that can be represented in this way?
The smallest (most) negative integer that can be represented in this way?
The largest positive integer that can be represented in this way?
This is a CS homework question that I am having trouble answering and explaining. Any help would be appreciated.
So its really easy
Counting in binary usually goes like
>00000000 (8) = 0
>00000001 (8) = 1
>00000010 (8) = 2
>00000011 (8) = 3
>etc...etc.
In 7 bit the last bit is what decides if its a negative or positive
1 being negative and 0 being positive
> 10000000 = -128
> 10000001 = -127
> 10000010 = -126
> On...and on...
> 11111111 = -1
> 00000000 = 0
> 00000001 = 1
> 01111111 = 127
So lowest you can go is -128
Highest you can go is 127
Approved answer is not correct.
With 7-bits of 2's complement, it could range from -64 to 63. (traditionally, 7 bits can only go up to 2^n-1 which is 128 but MSB is reserved for sign, so we could have 6 bits to represent the data.
We will be getting 64 positive and 63 negative values and answer should be -64, 63.)
No, because in two's complement, the most significant bit is the sign bit. 0000001 is +1, a positive number.
That is why the range of two's complement 7-bit numbers is -64 to 63, because 64 is not representable (it would be negative number otherwise).
The most negative number is 1000000. The leading 1 tells you it's negative, and to get the magnitude of the number, you flip all the bits (0111111), then add one (1000000 = 64). So the resulting number is -64 thru 63.
For largest positive integer in 2's complement use the formula
(2^(n-1)-1)
That is (2^(7-1)-1)=63
For the smallest use
-2^n-1
That is -2^7-1=-64

Is it possible for "floor" to return an inaccurate result due to floating point rounding error?

I understand that floating point numbers can often include rounding errors.
When you take the floor or ceiling of a float (or double) in order to convert it to an integer, will the resultant value be exact or can the "floored" value still be an approximation?
Basically, is it possible for something like floor(3.14159265) to return a value which is essentially 2.999999, which would convert to 2 when you try to cast that to an int?
Is it possible for something like floor(3.14159265) to return a value which is essentially 2.999999?
The floor() function returns an floating point value that is an exact integer. So the premise of your question is wrong to begin with.
Now, floor(x) returns the nearest integral value that is not greater than x. It is always true that
floor(x) <= x
and that there exists no integer i, greater than floor(x), such that i <= x.
Looking at floor(3.14159265), this returns 3.0. There's no debate about that. Nothing more to say.
Where it gets interesting is if you write floor(x) where x is the result of an arithmetic expression. Floating point precision and rounding can mean that x falls on the wrong side of an integer. In other words, the true value of the expression that yields x is greater than some integer, i, but that x when evaluated using floating point arithmetic is less than i.
Small integers are representable exactly as floats, but big integers are not.
But, as others pointed out, big integers not representable by float will never be representable by a non-integer, so floor() will never return a non-integer value. Thus, the cast to (int), as long as it does not overflow, will be correct.
But how small is small? Copying shamelessly from this answer:
For float, it is 16,777,217 (224 + 1).
For double, it is 9,007,199,254,740,993 (253 + 1).
Note that the usual range of int (32-bits) is 231, so float is unable to represent all of them exactly. Use double if you need that.
Interestingly, floats can store a certain range of integers exactly, for example:
1 is stored as mantissa 1 (binary 1) * exponent 2^0
2 is stored as mantissa 1 (binary 1) * exponent 2^1
3 is stored as mantissa 1.5 (binary 1.1) * exponent 2^1
4 is stored as mantissa 1 * exponent 2^2
5 is stored as mantissa 1.25 (binary 1.01) * exponent 2^2
6 is stored as mantissa 1.5 (binary 1.1) * exponent 2^2
7 is stored as mantissa 1.75 (binary 1.11) * exponent 2^2
8 is stored as mantissa 1 (binary 1) * exponent 2^3
9 is stored as mantissa 1.125 (binary 1.001) * exponent 2^3
10 is stored as mantissa 1.25 (binary 1.01) * exponent 2^3
...
As you can see, the way exponents increase works in with the perfectly-stored fractional values the mantissa can represent.
You can get a good sense for this by putting number into this great online conversion site.
Once you cross a certain threshold, there's not enough digits in the mantissa to divide the span of the increased exponents without skipping first every odd integer value, then three out of every four, then 7 out of 8 etc.. For numbers over this threshold, the issue is not that they might be different from integer values by some tiny fractional amount, its that all the representable values are integers and not only can no fractional part be represented any more, but as above some of the integers can't be either.
You can observe this in the calculator by considering:
Binary Decimal
+-Exponent Mantissa
0 10010110 11111111111111111111111 16777215
0 10010111 00000000000000000000000 16777216
0 10010111 00000000000000000000001 16777218
See how at this stage, the smallest possible increment of the mantissa is actually "worth 2" in terms of the decimal value represented?
When you take the floor or ceiling of a float (or double) in order to convert it to an integer, will the resultant value be exact or can the "floored" value still be an approximation?
It's always exact. What floor is doing is effectively wiping out any '1's in the mantissa whose significance (their contribution to value) is fractional anyway.
Basically, is it possible for something like floor(3.14159265) to return a value which is essentially 2.999999, which would convert to 2 when you try to cast that to an int?
No.

How numbers are stored? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
What kind of method does the compiler use to store numbers? One char is 0-255. Two chars side by side are 0-255255. In c++ one short is 2 bytes. The size is 0-65535. Now, how does the compiler convert 255255 to 65535 and what happens with the - in the unsigned numbers?
The maximum value you can store in n bits (when the lowest value is 0 and the values represented are a continuous range), is 2ⁿ − 1. For 8 bits, this gives 255. For 16 bits, this gives 65535.
Your mistake is thinking that you can just concatenate 255 with 255 to get the maximum value in two chars - this is definitely wrong. Instead, to get from the range of 8 bits, which is 256, to the range of 16 bits, you would do 256 × 256 = 65536. Since our first value is 0, the maximum value is 65535, once again.
Note that a char is only guaranteed to have at least 8 bits and a short at least 16 bits (and must be at least as large as a char).
You have got the math totally wrong. Here's how it really is
since each bit can only take on either of two states only(1 and 0) , n bits as a whole can represents 2^n different quantities not numbers. When dealing with integers a standard short integer size of 2 bytes can represent 2^n - 1 (n=16 so 65535)which are mapped to decimal numbers in real life for compuational simplicity.
When dealing with 2 character they are two seperate entities (string is an array of characters). There are many ways to read the two characters on a whole, if you read is at as a string then it would be same as two seperate characters side by side. let me give you an example :
remember i will be using hexadecimal notation for simplicity!
if you have doubt mapping ASCII characters to hex take a look at this ascii to hex
for simplicity let us assume the characters stored in two adjacent positions are both A.
Now hex code for A is 0x41 so the memory would look like
1 byte ....... 2nd byte
01000100 01000001
if you were to read this from the memory as a string and print it out then the output would be
AA
if you were to read the whole 2 bytes as an integer then this would represent
0 * 2^15 + 1 * 2^14 + 0 * 2^13 + 0 * 2^12 + 0 * 2^11 + 1 * 2^10 + 0 * 2^9 + 0 * 2^8 + 0 * 2^7 + 1 * 2^6 + 0 * 2^5 + 0 * 2^4 + 0 * 2^3 + 0 * 2^2 + 0 * 2^1 + 1 * 2^0
= 17537
if unsigned integers were used then the 2 bytes of data would me mapped to integers between
0 and 65535 but if the same represented a signed value then then , though the range remains the same the biggest positive number that can be represented would be 32767. the values would lie between -32768 and 32767 this is because all of the 2 bytes cannot be used and the highest order bit is left to determine the sign. 1 represents negative and 2 represents positive.
You must also note that type conversion (two characters read as single integer) might not always give you the desired results , especially when you narrow the range. (example a doble precision float is converted to an integer.)
For more on that see this answer double to int
hope this helps.
When using decimal system it's true that range of one digit is 0-9 and range of two digits is 0-99. When using hexadecimal system the same thing applies, but you have to do the math in hexadecimal system. Range of one hexadecimal digit is 0-Fh, and range of two hexadecimal digits (one byte) is 0-FFh. Range of two bytes is 0-FFFFh, and this translates to 0-65535 in decimal system.
Decimal is a base-10 number system. This means that each successive digit from right-to-left represents an incremental power of 10. For example, 123 is 3 + (2*10) + (1*100). You may not think of it in these terms in day-to-day life, but that's how it works.
Now you take the same concept from decimal (base-10) to binary (base-2) and now each successive digit is a power of 2, rather than 10. So 1100 is 0 + (0*2) + (1*4) + (1*8).
Now let's take an 8-bit number (char); there are 8 digits in this number so the maximum value is 255 (2**8 - 1), or another way, 11111111 == 1 + (1*2) + (1*4) + (1*8) + (1*16) + (1*32) + (1*64) + (1*128).
When there are another 8 bits available to make a 16-bit value, you just continue counting powers of 2; you don't just "stick" the two 255s together to make 255255. So the maximum value is 65535, or another way, 1111111111111111 == 1 + (1*2) + (1*4) + (1*8) + (1*16) + (1*32) + (1*64) + (1*128) + (1*256) + (1*512) + (1*1024) + (1*2048) + (1*4096) + (1*8192) + (1*16384) + (1*32768).
It depends on the type: integral types must be stored as binary
(or at least, appear so to a C++ program), so you have one bit
per binary digit. With very few exceptions, all of the bits are
significant (although this is not required, and there is at
least one machine where there are extra bits in an int). On
a typical machine, char will be 8 bits, and if it isn't
signed, can store values in the range [0,2^8); in other words,
between 0 and 255 inclusive. unsigned short will be 16 bits
(range [0,2^16)), unsigned int 32 bits (range [0,2^32))
and unsigned long either 32 or 64 bits.
For the signed values, you'll have to use at least one of the
bits for the sign, reducing the maximum positive value. The
exact representation of negative values can vary, but in most
machines, it will be 2's complement, so the ranges will be
signed char:[-2^7,2^7-1)`, etc.
If you're not familiar with base two, I'd suggest you learn it
very quickly; it's fundamental to how all modern machines store
numbers. You should find out about 2's complement as well: the
usual human representation is called sign plus magnitude, and is
very rare in computers.