How numbers are stored? [closed] - c++

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
What kind of method does the compiler use to store numbers? One char is 0-255. Two chars side by side are 0-255255. In c++ one short is 2 bytes. The size is 0-65535. Now, how does the compiler convert 255255 to 65535 and what happens with the - in the unsigned numbers?

The maximum value you can store in n bits (when the lowest value is 0 and the values represented are a continuous range), is 2ⁿ − 1. For 8 bits, this gives 255. For 16 bits, this gives 65535.
Your mistake is thinking that you can just concatenate 255 with 255 to get the maximum value in two chars - this is definitely wrong. Instead, to get from the range of 8 bits, which is 256, to the range of 16 bits, you would do 256 × 256 = 65536. Since our first value is 0, the maximum value is 65535, once again.
Note that a char is only guaranteed to have at least 8 bits and a short at least 16 bits (and must be at least as large as a char).

You have got the math totally wrong. Here's how it really is
since each bit can only take on either of two states only(1 and 0) , n bits as a whole can represents 2^n different quantities not numbers. When dealing with integers a standard short integer size of 2 bytes can represent 2^n - 1 (n=16 so 65535)which are mapped to decimal numbers in real life for compuational simplicity.
When dealing with 2 character they are two seperate entities (string is an array of characters). There are many ways to read the two characters on a whole, if you read is at as a string then it would be same as two seperate characters side by side. let me give you an example :
remember i will be using hexadecimal notation for simplicity!
if you have doubt mapping ASCII characters to hex take a look at this ascii to hex
for simplicity let us assume the characters stored in two adjacent positions are both A.
Now hex code for A is 0x41 so the memory would look like
1 byte ....... 2nd byte
01000100 01000001
if you were to read this from the memory as a string and print it out then the output would be
AA
if you were to read the whole 2 bytes as an integer then this would represent
0 * 2^15 + 1 * 2^14 + 0 * 2^13 + 0 * 2^12 + 0 * 2^11 + 1 * 2^10 + 0 * 2^9 + 0 * 2^8 + 0 * 2^7 + 1 * 2^6 + 0 * 2^5 + 0 * 2^4 + 0 * 2^3 + 0 * 2^2 + 0 * 2^1 + 1 * 2^0
= 17537
if unsigned integers were used then the 2 bytes of data would me mapped to integers between
0 and 65535 but if the same represented a signed value then then , though the range remains the same the biggest positive number that can be represented would be 32767. the values would lie between -32768 and 32767 this is because all of the 2 bytes cannot be used and the highest order bit is left to determine the sign. 1 represents negative and 2 represents positive.
You must also note that type conversion (two characters read as single integer) might not always give you the desired results , especially when you narrow the range. (example a doble precision float is converted to an integer.)
For more on that see this answer double to int
hope this helps.

When using decimal system it's true that range of one digit is 0-9 and range of two digits is 0-99. When using hexadecimal system the same thing applies, but you have to do the math in hexadecimal system. Range of one hexadecimal digit is 0-Fh, and range of two hexadecimal digits (one byte) is 0-FFh. Range of two bytes is 0-FFFFh, and this translates to 0-65535 in decimal system.

Decimal is a base-10 number system. This means that each successive digit from right-to-left represents an incremental power of 10. For example, 123 is 3 + (2*10) + (1*100). You may not think of it in these terms in day-to-day life, but that's how it works.
Now you take the same concept from decimal (base-10) to binary (base-2) and now each successive digit is a power of 2, rather than 10. So 1100 is 0 + (0*2) + (1*4) + (1*8).
Now let's take an 8-bit number (char); there are 8 digits in this number so the maximum value is 255 (2**8 - 1), or another way, 11111111 == 1 + (1*2) + (1*4) + (1*8) + (1*16) + (1*32) + (1*64) + (1*128).
When there are another 8 bits available to make a 16-bit value, you just continue counting powers of 2; you don't just "stick" the two 255s together to make 255255. So the maximum value is 65535, or another way, 1111111111111111 == 1 + (1*2) + (1*4) + (1*8) + (1*16) + (1*32) + (1*64) + (1*128) + (1*256) + (1*512) + (1*1024) + (1*2048) + (1*4096) + (1*8192) + (1*16384) + (1*32768).

It depends on the type: integral types must be stored as binary
(or at least, appear so to a C++ program), so you have one bit
per binary digit. With very few exceptions, all of the bits are
significant (although this is not required, and there is at
least one machine where there are extra bits in an int). On
a typical machine, char will be 8 bits, and if it isn't
signed, can store values in the range [0,2^8); in other words,
between 0 and 255 inclusive. unsigned short will be 16 bits
(range [0,2^16)), unsigned int 32 bits (range [0,2^32))
and unsigned long either 32 or 64 bits.
For the signed values, you'll have to use at least one of the
bits for the sign, reducing the maximum positive value. The
exact representation of negative values can vary, but in most
machines, it will be 2's complement, so the ranges will be
signed char:[-2^7,2^7-1)`, etc.
If you're not familiar with base two, I'd suggest you learn it
very quickly; it's fundamental to how all modern machines store
numbers. You should find out about 2's complement as well: the
usual human representation is called sign plus magnitude, and is
very rare in computers.

Related

Can someone explain this bit wise operation

I am trying to understand what is happening in this bitwise operation that is used to set the size of a char array but I do not fully understand it.
unsigned long maxParams = 2;// or some other value, I just chose 2 arbitrarily
unsigned char m_uParamBitArray[(((maxParams) + ((8)-1)) & ~((8)-1))/8];
What size is this array set to based on the value of maxParams?
This counts the number of 8-bit chars needed to fit maxParams bits.
+7 is to round up to the next multiple of 8.
0 -> 7 will round to 0 char
1 -> 8 will round to 1 char
After division by 8. Which is done by /8
The bitwise and is to discard the last three bits before dividing by 8.
When maxParams is not divisible by eight, the first part of the formula formula rounds it up to the next multiple of eight; otherwise, it leaves the number unchanged. The second part of the formula divides the result by eight, which it can always do without getting a remainder.
In general, if you want to round up to N without going over it, you can add N-1 before the division:
(x + (N - 1)) / N
You can safely drop the & ~((8)-1) part of the expression:
(maxParams + (8-1))/8

How to calculate range of data type eg int

I want to know why in formula to calculate range of any data type
i.e.2^(n-1),why it is n-1 ,n is the number of bits occupied by the given data type
Assuming that the type is unsigned, the maximum value is (2n)-1, because there are 2n values, and one of them is zero.
2(n-1) is the value of the n:th bit alone - bit 1 is 20, bit 2 is 21, and so on.
This is the same for any number base - in the decimal system, n digits can represent 10n different values, with the maximum value being 10n-1, and the n:th digit is "worth" 10(n-1).
For example, the largest number with three decimal digits is 999 (that is, 103-1), and the third decimal digit is "the hundreds digit", 102.
First 2^(n-1) is not correct, the maximum (unsigned) number represented by the data type is:
max = 2^n - 1
So for a 8 Bit data type, the maximum represented value is 255
2^n tells you the amount of numbers represented (256 for the 8-Bit example) but because you want to include 0 the range is 0 to 255 and not 1 to 256

Why is the max value of an unsigned int -> 2^n - 1 in C++

If I have a 4 byte unsigned int, I have a space of 2^32 values.
The largest value should be 4294967296 (2^32), so why is it 4294967295 (2^32 - 1) instead?
It's just a basic math. Look at this example:
If you had a 1-bit long integer, what would the maximal value be?
According to what you say, it should be 2^1 = 2. How would you repesent a value 2 with just one bit?
Because counting starts from 0.
for 2 bit integer you can have 4 different values. (0,1,2,3) i.e 0 to 2^2 - 1.
(00,01,10,11)
Similarly for 32 bit integer you can have max value as 2^32 - 1.
2^1 occupies 2 bits.
2^32 occupies 33 bits.
But you have only 32 bits. So 2^32 - 1
Basic mathematics.
A bit can represent 2 values. A set of n bits can represent 2^n (using ^ as notation to represent "to the power of", not as a bitwise operation as is its default meaning in C++) distinct values.
A variable of unsigned integral type represents a sequence containing every consecutive integral value between 0 and the maximum value that type can represent. If the total number of sequential values is 2^n, and the first of them is zero, then the type can represent 2^n - 1 consecutive positive (non-zero) values. The maximum value must therefore be 2^n - 1.

Why do negative numbers in variables have a larger limit than a positive number (signed variables)?

As seen in the picture above all of the variables have a negative limit that is one more than the positive limit. I was how it is able to add that extra one. I know that the first digit in the variable is used to tell if it is negative (1) or if is not (0). I also know that binary is based on the powers of 2. What I am confused about is how there is one extra when the positive itself can't go higher and the negative only has one digit changing. For example, a short can go up to 32,767 (01111111 11111111) or 16,383 + all of the decimal values of the binary numbers below it. Negative numbers are the same thing except a one at the beginning, right? So how do the negative numbers have a larger limit? Thanks to anyone who answers!
The reason is a scheme called "2's complement" to represent signed integer.
You know that the most significant bit of a signed integer represent the sign. But what you don't know is, it also represent a value, a negative value.
Take a 4-bit 2's complement signed integer as an example:
1 0 1 0
-2^3 2^2 2^1 2^0
This 4-bit integer is interpreted as:
1 * -2^3 + 0 * 2^2 + 1 * 2^1 + 0 * 2^0
= -8 + 0 + 2 + 0
= -6
With this scheme, the max of 4-bit 2's complement is 7.
0 1 1 1
-2^3 2^2 2^1 2^0
And the min is -8.
1 0 0 0
-2^3 2^2 2^1 2^0
Also, 0 is represented by 0000, 1 is 0001, and -1 is 1111. Comparing these three numbers, we can observe that zero has its "sign bit" positive, and there is no "negative zero" in 2's complement scheme. In other words, half of the range only consists of negative number, but the other half of the range includes zero and positive numbers.
If integers are stored using two's complement then you get one extra negative value and a single zero. If they are stored using one's complement or signed magnitude you get two zeros and the same number of negative values as positive ones. Floating point numbers have their own storage scheme, and under IEEE formats use have an explicit sign bit.
I know that the first digit in the variable is used to tell if it is negative (1) or if is not (0).
The first binary digit (or bit), yes, assuming two's complement representation. Which basically answers your question. There are 32,768 numbers < 0 (-32,768 .. -1) , and 32,768 numbers >= 0 (0 .. +32,767).
Also note that in binary the total possible representations (bit patterns) are an even number. You couldn't have the min and max values equal in absolute values, since you'd end up with an odd number of possible values (counting 0). Thus, you'd have to waste or declare illegal at least one bit pattern.

Why can't my 64-bit computer store the double-precision number 2^1024 in C++?

My computer has an Intel® Core™2 Duo Processor, which I believe is a 64-bit processor. So I had thought that in C++ (or any other programming language), any double-precision number (which requires 64 bits) could be stored.
I read the following link: http://en.wikipedia.org/wiki/Double-precision_floating-point_format
Based on that, I know that a double-precision number has 1 bit for the sign, 11 bits for the exponent, and the remaining 52 bits for the fraction.
So, I thought I'd experiment with a double that has the following binary code:
0 11111111111 0000000000000000000000000000000000000000000000000000
Again, based on what I read in the above link, I think this number translates to a decimal number of 2^1024, the exponenent being equal to the following:
2^10 + 2^9 + 2^8 + 2^7 + 2^6 + 2^5 + 2^4 + 2^3 + 2^2 + 2^1 + 2^0 - 1023
I then tried to print 2^1024 in C++, using the following code:
double x = pow(2.0, 1024);
cout << x << endl;
However, when I run the program, it prints '1.#INF'
Does anyone know why?
Some bit patters are reserved for special use like not-a-number, +infinity and -infinity. You have encountered the one used for +infinity.