Can someone explain this bit wise operation - c++

I am trying to understand what is happening in this bitwise operation that is used to set the size of a char array but I do not fully understand it.
unsigned long maxParams = 2;// or some other value, I just chose 2 arbitrarily
unsigned char m_uParamBitArray[(((maxParams) + ((8)-1)) & ~((8)-1))/8];
What size is this array set to based on the value of maxParams?

This counts the number of 8-bit chars needed to fit maxParams bits.
+7 is to round up to the next multiple of 8.
0 -> 7 will round to 0 char
1 -> 8 will round to 1 char
After division by 8. Which is done by /8
The bitwise and is to discard the last three bits before dividing by 8.

When maxParams is not divisible by eight, the first part of the formula formula rounds it up to the next multiple of eight; otherwise, it leaves the number unchanged. The second part of the formula divides the result by eight, which it can always do without getting a remainder.
In general, if you want to round up to N without going over it, you can add N-1 before the division:
(x + (N - 1)) / N
You can safely drop the & ~((8)-1) part of the expression:
(maxParams + (8-1))/8

Related

Why is the result of a bitwise shift unrecoverable if there is a mathematical equivalent of the same operation?

Take for example the number 91. That number in binary is 1011011. If you shift that number to the right by 5 bits, you would get 2 (10 in binary). According to a google search, bit shifting to the left or right by a certain amount of bits is the same as multiplying or dividing the number by 2 to the power of the number of bits to be shifted, respectively. so to get from 91 to 2 by bit shifting, the equation would look like this: 91 / 2^5, which is also 91 / 32. Now, of course if you did that in your calculator, there would be some decimal values, which aren't included when bit shifting. The resulting 2 is actually 2.84357. I'm sure you know that if you do a certain operation on a number and then you do the inverse, the result would be what you had in the first place. So does decimal precision have something to do with this?
There is a mathematical equivalent of shifting to the right... and the mathematical operation is UNRECOVERABLE.
You seem to think that shifting to the right is:
bit shifting to the left or right by a certain amount of bits is the same as multiplying or dividing the number by 2
This is what you will hear people casually say, but it is only half right. As it it is not the same but only similar.
The correct statement is:
shifting a base-2 number one digit to the right is THE SAME as dividing by two in the integer domain
If you have an integer calculator, if you did 91/32 you will get 2. You will not get ANY decimal point because we are operating in the integer domain.
For real numbers, the equivalent operation is:
FLOOR(91/32)
Which is also unrecoverable because it also results in 2.
The lesson here is be careful when listening to what people CASUALLY say. Casual speech is often imprecise and assumes the listener is familiar with the subject. You need to dig deeper what the statement is actually trying to say.
As for why it is unrecoverable? Division of integers give two results: the quotient (which is the main result) and the remainder. When we divide 91 by 32 we are doing this:
2
_____
32 ) 91
64
__
27
So we get the result of 2 and a remainder of 27. The reason you can't get 91 by multiplying 2*32 is because we threw away the remainder.
You can get the result back if you saved the remainder. However, calculating the remainder is not a matter of simple shifts. Here's an example of how to make it reversable in C:
int test () {
int a = 91;
int b = 32;
int result;
int remainder;
result = a / b; // result will be 2
remainder = a % b; // remainder will be 27
return (result * b) + remainder; // returns 91
}
You can only recover the result of an operation if it has a 1-1 mapping between the inputs and outputs, i.e. it has an inverse function. But not all mathematical functions have an inverse function
For example if f(x) = x >> n with >> is the shift operator then it'll be equivalent to
f(x) = ⌊x/2n⌋
with ⌊ ⌋ being the floor function. Since there are many inputs that lead to the same output, the relationship isn't 1-1 and there can't be an inverse function for it. This function works the same for both signed and unsigned right shift:
91 >> 5 == floor(91.0/32.0) == 2
-91 >> 5 == floor(-91.0/32.0) == -3
Similarly for an unsigned left shift function g(x) = x << n then the equivalent is
g(x) = (x * 2n) mod 2N
with N being the size in bits of x, because integer math in hardware, C and many other languages always reduce modulo 2N due to the limit of register size and the use of two's complement. And it's clear that the modulo function also isn't invertible/recoverable. The signed left shift is almost the same with some small modifications

Why do negative numbers in variables have a larger limit than a positive number (signed variables)?

As seen in the picture above all of the variables have a negative limit that is one more than the positive limit. I was how it is able to add that extra one. I know that the first digit in the variable is used to tell if it is negative (1) or if is not (0). I also know that binary is based on the powers of 2. What I am confused about is how there is one extra when the positive itself can't go higher and the negative only has one digit changing. For example, a short can go up to 32,767 (01111111 11111111) or 16,383 + all of the decimal values of the binary numbers below it. Negative numbers are the same thing except a one at the beginning, right? So how do the negative numbers have a larger limit? Thanks to anyone who answers!
The reason is a scheme called "2's complement" to represent signed integer.
You know that the most significant bit of a signed integer represent the sign. But what you don't know is, it also represent a value, a negative value.
Take a 4-bit 2's complement signed integer as an example:
1 0 1 0
-2^3 2^2 2^1 2^0
This 4-bit integer is interpreted as:
1 * -2^3 + 0 * 2^2 + 1 * 2^1 + 0 * 2^0
= -8 + 0 + 2 + 0
= -6
With this scheme, the max of 4-bit 2's complement is 7.
0 1 1 1
-2^3 2^2 2^1 2^0
And the min is -8.
1 0 0 0
-2^3 2^2 2^1 2^0
Also, 0 is represented by 0000, 1 is 0001, and -1 is 1111. Comparing these three numbers, we can observe that zero has its "sign bit" positive, and there is no "negative zero" in 2's complement scheme. In other words, half of the range only consists of negative number, but the other half of the range includes zero and positive numbers.
If integers are stored using two's complement then you get one extra negative value and a single zero. If they are stored using one's complement or signed magnitude you get two zeros and the same number of negative values as positive ones. Floating point numbers have their own storage scheme, and under IEEE formats use have an explicit sign bit.
I know that the first digit in the variable is used to tell if it is negative (1) or if is not (0).
The first binary digit (or bit), yes, assuming two's complement representation. Which basically answers your question. There are 32,768 numbers < 0 (-32,768 .. -1) , and 32,768 numbers >= 0 (0 .. +32,767).
Also note that in binary the total possible representations (bit patterns) are an even number. You couldn't have the min and max values equal in absolute values, since you'd end up with an odd number of possible values (counting 0). Thus, you'd have to waste or declare illegal at least one bit pattern.

How numbers are stored? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
What kind of method does the compiler use to store numbers? One char is 0-255. Two chars side by side are 0-255255. In c++ one short is 2 bytes. The size is 0-65535. Now, how does the compiler convert 255255 to 65535 and what happens with the - in the unsigned numbers?
The maximum value you can store in n bits (when the lowest value is 0 and the values represented are a continuous range), is 2ⁿ − 1. For 8 bits, this gives 255. For 16 bits, this gives 65535.
Your mistake is thinking that you can just concatenate 255 with 255 to get the maximum value in two chars - this is definitely wrong. Instead, to get from the range of 8 bits, which is 256, to the range of 16 bits, you would do 256 × 256 = 65536. Since our first value is 0, the maximum value is 65535, once again.
Note that a char is only guaranteed to have at least 8 bits and a short at least 16 bits (and must be at least as large as a char).
You have got the math totally wrong. Here's how it really is
since each bit can only take on either of two states only(1 and 0) , n bits as a whole can represents 2^n different quantities not numbers. When dealing with integers a standard short integer size of 2 bytes can represent 2^n - 1 (n=16 so 65535)which are mapped to decimal numbers in real life for compuational simplicity.
When dealing with 2 character they are two seperate entities (string is an array of characters). There are many ways to read the two characters on a whole, if you read is at as a string then it would be same as two seperate characters side by side. let me give you an example :
remember i will be using hexadecimal notation for simplicity!
if you have doubt mapping ASCII characters to hex take a look at this ascii to hex
for simplicity let us assume the characters stored in two adjacent positions are both A.
Now hex code for A is 0x41 so the memory would look like
1 byte ....... 2nd byte
01000100 01000001
if you were to read this from the memory as a string and print it out then the output would be
AA
if you were to read the whole 2 bytes as an integer then this would represent
0 * 2^15 + 1 * 2^14 + 0 * 2^13 + 0 * 2^12 + 0 * 2^11 + 1 * 2^10 + 0 * 2^9 + 0 * 2^8 + 0 * 2^7 + 1 * 2^6 + 0 * 2^5 + 0 * 2^4 + 0 * 2^3 + 0 * 2^2 + 0 * 2^1 + 1 * 2^0
= 17537
if unsigned integers were used then the 2 bytes of data would me mapped to integers between
0 and 65535 but if the same represented a signed value then then , though the range remains the same the biggest positive number that can be represented would be 32767. the values would lie between -32768 and 32767 this is because all of the 2 bytes cannot be used and the highest order bit is left to determine the sign. 1 represents negative and 2 represents positive.
You must also note that type conversion (two characters read as single integer) might not always give you the desired results , especially when you narrow the range. (example a doble precision float is converted to an integer.)
For more on that see this answer double to int
hope this helps.
When using decimal system it's true that range of one digit is 0-9 and range of two digits is 0-99. When using hexadecimal system the same thing applies, but you have to do the math in hexadecimal system. Range of one hexadecimal digit is 0-Fh, and range of two hexadecimal digits (one byte) is 0-FFh. Range of two bytes is 0-FFFFh, and this translates to 0-65535 in decimal system.
Decimal is a base-10 number system. This means that each successive digit from right-to-left represents an incremental power of 10. For example, 123 is 3 + (2*10) + (1*100). You may not think of it in these terms in day-to-day life, but that's how it works.
Now you take the same concept from decimal (base-10) to binary (base-2) and now each successive digit is a power of 2, rather than 10. So 1100 is 0 + (0*2) + (1*4) + (1*8).
Now let's take an 8-bit number (char); there are 8 digits in this number so the maximum value is 255 (2**8 - 1), or another way, 11111111 == 1 + (1*2) + (1*4) + (1*8) + (1*16) + (1*32) + (1*64) + (1*128).
When there are another 8 bits available to make a 16-bit value, you just continue counting powers of 2; you don't just "stick" the two 255s together to make 255255. So the maximum value is 65535, or another way, 1111111111111111 == 1 + (1*2) + (1*4) + (1*8) + (1*16) + (1*32) + (1*64) + (1*128) + (1*256) + (1*512) + (1*1024) + (1*2048) + (1*4096) + (1*8192) + (1*16384) + (1*32768).
It depends on the type: integral types must be stored as binary
(or at least, appear so to a C++ program), so you have one bit
per binary digit. With very few exceptions, all of the bits are
significant (although this is not required, and there is at
least one machine where there are extra bits in an int). On
a typical machine, char will be 8 bits, and if it isn't
signed, can store values in the range [0,2^8); in other words,
between 0 and 255 inclusive. unsigned short will be 16 bits
(range [0,2^16)), unsigned int 32 bits (range [0,2^32))
and unsigned long either 32 or 64 bits.
For the signed values, you'll have to use at least one of the
bits for the sign, reducing the maximum positive value. The
exact representation of negative values can vary, but in most
machines, it will be 2's complement, so the ranges will be
signed char:[-2^7,2^7-1)`, etc.
If you're not familiar with base two, I'd suggest you learn it
very quickly; it's fundamental to how all modern machines store
numbers. You should find out about 2's complement as well: the
usual human representation is called sign plus magnitude, and is
very rare in computers.

What does this exactly mean?

Could anyone please explain me what exactly the following line of code produce?
i = 1<<(sizeof(n) * 8 - 1);
You can assume whatever value you want for 'n'. I am trying to implement an 8 bit multiplication program using Booths algorithm.
Let's break it down:
sizeof(n) delivers the size of the type of variable n. For an int variable n on a 32 bit system, this would e.g. be 4 (bytes). See the sizeof documentation e.g. here: http://en.cppreference.com/w/cpp/keyword/sizeof)
* 8 -> multiplication by the number of bits in one byte -> i.e. sizeof(n) * 8 delivers the number of bits necessary for n.
<< is the shiftleft operator. It will shift the first operand to the left by the amount of bits specified by the second operand (see here: http://en.wikipedia.org/wiki/Logical_shift); it's the logical shift, meaning that bits shifted in from the right are filled up with zeroes.
The full expression therefore delivers an expresssion with the highest bit representable by the variable n set to 1.
Example (assuming n now to be of type char, and assuming the size of char as the typical 1 byte):
sizeof(char) = 1
=> sizeof(char) * 8 - 1 = 7
=> 1 << 7 = 10000000

Counting number of bits: How does this line work ? n=n&(n-1); [duplicate]

This question already has answers here:
n & (n-1) what does this expression do? [duplicate]
(4 answers)
Closed 6 years ago.
I need some explanation how this specific line works.
I know that this function counts the number of 1's bits, but how exactly this line clears the rightmost 1 bit?
int f(int n) {
int c;
for (c = 0; n != 0; ++c)
n = n & (n - 1);
return c;
}
Can some explain it to me briefly or give some "proof"?
Any unsigned integer 'n' will have the following last k digits: One followed by (k-1) zeroes: 100...0
Note that k can be 1 in which case there are no zeroes.
(n - 1) will end in this format: Zero followed by (k-1) 1's: 011...1
n & (n-1) will therefore end in 'k' zeroes: 100...0 & 011...1 = 000...0
Hence n & (n - 1) will eliminate the rightmost '1'. Each iteration of this will basically remove the rightmost '1' digit and hence you can count the number of 1's.
I've been brushing up on bit manipulation and came across this. It may not be useful to the original poster now (3 years later), but I am going to answer anyway to improve the quality for other viewers.
What does it mean for n & (n-1) to equal zero?
We should make sure we know that since that is the only way to break the loop (n != 0).
Let's say n=8. The bit representation for that would be 00001000. The bit representation for n-1 (or 7) would be 00000111. The & operator returns the bits set in both arguments. Since 00001000 and 00000111 do not have any similar bits set, the result would be 00000000 (or zero).
You may have caught on that the number 8 wasn't randomly chosen. It was an example where n is power of 2. All powers of 2 (2,4,8,16,etc) will have the same result.
What happens when you pass something that is not an exponent of 2? For example, when n=6, the bit representation is 00000110 and n-1=5 or 00000101.The & is applied to these 2 arguments and they only have one single bit in common which is 4. Now, n=4 which is not zero so we increment c and try the same process with n=4. As we've seen above, 4 is an exponent of 2 so it will break the loop in the next comparison. It is cutting off the rightmost bit until n is equal to a power of 2.
What is c?
It is only incrementing by one every loop and starts at 0. c is the number of bits cut off before the number equals a power of 2.