Some random C questions (ascii magic and bitwise operators) - c++

I am trying to learn C programming, and I was studying some source codes and there are some things I didn't understand, especially regarding Bitwise Operators. I read some sites on this, and I kinda got an idea on what they do, but when I went back to look at this codes, I could not understand why and how where they used.
My first question is not related to bitwise operators but rather some ascii magic:
Can somebody explain to me how the following code works?
char a = 3;
int x = a - '0';
I understand this is done to convert a char into an int, however I don't understand the logic behind it. Why/How does it work?
Now, Regarding Bitwise operators, I feel really lost here.
What does this code do?
if (~pointer->intX & (1 << i)) { c++; n = i; }
I read somewhere that ~ inverts bits, but I fail to see what this statement is doing and why is it doing that.
Same with this line:
row.data = ~(1 << i);
Other question:
if (x != a)
{
ret |= ROW;
}
What exactly is the |= operator doing? From what I read, |= is OR but i don't quite understand what is this statement doing.
Is there any way of rewriting this code to make it easier to understands so that it doesn't use this bitwise operators? I find them very confusing to understand, so hopefully somebody will point me in the right direction on understanding how they work better!
I have a much better understanding of bitwise operators now and the whole code makes much more sense now.
One last thing: appartenly nobody responded if there would be a "cleaner" way for rewriting this code in a way that its easier to understand and maybe not at "bitlevel". Any ideas?

This will produce junk:
char a = 3;
int x = a - '0';
This is different - note the quotes:
char a = '3';
int x = a - '0';
The char datatype stores a number that identifiers a character. The characters for the digits 0 through 9 are all next to each other in the character code list, so if you subtract the code for '0' from the code for '9', you get the answer 9. So this will turn a digit character code into the integer value of the digit.
(~pointer->intX & (1 << i))
That will be interpreted by the if statement as true if it's non-zero. There are three different bitwise operators being used.
The ~ operator flips all the bits in the number, so if pointer->intX was 01101010, then ~pointer->intX will be 10010101. (Note that throughout, I'm illustrating the contents of a byte. If it was a 32-bit integer, I'd have to write 32 digits of 1s and 0s).
The & operator combines two numbers into one number, by dealing with each bit separately. The resulting bit is only 1 if both the input bits are 1. So if the left side is 00101001 and the right side is 00001011, the result will be 00001001.
Finally, << means left shift. If you start with 00000001 and left shift it by three places, you'll have 00001000. So the expression (1 << i) produces a value where bit i is switched on, and the others are all switch off.
Putting it all together, it tests if bit i is switched off (zero) in pointer->intX.
So you may be able to figure out what ~(1 << i) does. If i is 4, the thing in brackets will be 00010000, and so the whole thing will be 11101111.
ret |= ROW;
That one is equivalent to:
ret = ret | ROW;
The | operator is like & except that the resulting bit is 1 if either of the input bits is 1. So if ret is 00100000 and ROW is 00000010, the result will be 00100010.

ret |= ROW;
is equivalent to
ret = ret | ROW;

For char a = 3; int x = a - '0'; I think you meant char a = '3'; int x = a - '0';. It's easy enough to understand if you realize that in ASCII the numbers come in order, like '0', '1', '2', ... So if '0' is 48 and '1' is 49, then '1' - '0' is 1.
For bitwise operations, they are hard to grasp until you start looking at bits. When you view these operations on binary numbers then you can see exactly how they work...
010 & 111 = 010
010 | 111 = 111
010 ^ 111 = 101
~010 = 101

I think you probably have a typo, and meant:
char a = '3';
The reason this works is that all the numbers come in order, and '0' is the first. Obviously, '0' - '0' = 0. '1' - '0' = 1, since the character value for '1' is one greater than the character value for '0'. Etc.

1) A char is really just a 8-bit integer. '0' == 48, and all that that implies.
2) (~(pointer->intX) & (1 << i)) evalutates whether the 'i'th bit (from the right) in the intX member of whatever pointer points to is not set. The ~ inverts the bits, so all the 0s become 1s and vice versa, then the 1 << i puts a single 1 in the desired location, & combines the two values so that only the desired bit is kept, and the whole thing evalutes to true if that bit was 0 to begin with.
3) | is bitwise or. It takes each bit in both operands and performs a logical OR, producing a result where each bit is set if either operand had that bit set. 0b11000000 | 0b00000011 == 0b11000011. |= is an assignment operator, in the same way that a+=b means a=a+b, a|=b means a=a|b.
Not using bitwise operators CAN make things easier to read in some cases, but it will usually also make your code significantly slower without strong compiler optimization.

The subtraction trick you reference works because ASCII numbers are arranged in ascending order, starting with zero. So if ASCII '0' is a value of 48 (and it is), then '1' is a value of 49, '2' is 50, etc. Therefore ASCII('1') - ASCII('0') = 49 - 48 = 1.
As far as bitwise operators go, they allow you to perform bit-level operations on variables.
Let's break down your example:
(1 << i) -- this is left-shifting the constant 1 by i bits. So if i=0, the result is decimal 1. If i = 1, it shifts the bit one to the left, backfilling with zeros, yielding binary 0010, or decimal 2. If i = 2, you shift the bit two to the left, backfilling with zeros, yielding binary 0100 or decimal 4, etc.
~pointer->intX -- this is taking the value of the intX member of pointer and inverting its bits, setting all zeros to ones and vice versa.
& -- the ampersand operator does a bitwise AND comparison. The results of this will be 1 wherever both the left and right side of the expression are 1, and 0 otherwise.
So the test will succeed if pointer->intX has a 0 bit at the ith position from the right.
Also, |= means to do a bitwise OR comparison and assign the result to the left side of the expression. The result of a bitwise OR is 1 for every bit where the corresponding left or right side bit is 1,

Single quotes are used to indicate that a single char is used. '0' therefore is the char '0', which has the ASCII-Code 48.
3-'0'=3-48
'1<<i' shifts 1 i places to the left, therefore only the ith bit from the right is 1.
~pointer->intX negates the field intX, so the logical AND returns a true value (not 0) when intX has every bit except for the ith bit from the right isn't set.

char a = '3';
int x = a - '0';
you had a typo here (notice the 's around the 3), this assigns the ascii value of the character 3, to the char variable, then the next line takes '3' - '0' and assigns it to x, because of the way ascii values work, x will then be equal to 3 (integer value)
In the first comparison, I've never seen ~ being used on a pointer that way before, another typo maybe? If I were to read out the following code:
(~pointer->intX & (1 << i))
I would say "(the value intX dereferenced from pointer) AND (1 left shifted i times)"
1 << i is a quick way of multiplying 1 by a power of 2, ie if i is 3, then 1 << 3 == 8
In this case, I have no clue why you would invert the bits of the pointer..
In the 2nd comparison, x |= y is the same as x = x | y

I'm assuming you mean char a='3'; for the first line of code (otherwise you get a rather strange answer). The basic principal is that ASCII codes for digits are sequential, i.e. the code for '0'=48, the code for '1'=49, and so on. Subtracting '0' simply converts from the ASCII code to the actual digit, so e.g. '3' - '0' = 3, and so on. Note that this will only work if the character you're subtracting '0' from is an actual digit - otherwise the result will have little meaning.
a. Without context the "why" of this code is impossible to say. As for what it's doing, it appears that the if statement evaluates as true when bit i of pointer->intX is not set, i.e. that particular bit is a 0. I believe the & operator gets executed before the ~ operator, as the ~ operator has very low precedence. The code could make better use of parentheses to make the intended order of operations clearer. In this case, the order of operations might not matter though - I believe the result is the same either way.
b. This is simply creating a number with all bits EXCEPT bit i set to 1. A convenient way of creating a mask for bit i is to use the expression (1<<i).
The bitwise OR operation in this case is used to set the bits specified by the ROW constant to 1. If these bits are not set, it sets them; if they're already set it has no effect.

1) Can somebody explain to me how the following code works? char a = 3; int x = a - '0';
I undertand this is done to convert a char into an int, however I don't understand the logic behind it. Why/How does it work?
Sure. variable a is of type char, and by putting single quotes around 0 that is causing C to view it as a char as well. Finally, the whole statement is automagically typecast to its integer equivalent, because x is defined as an integer.
2) Now, Regarding Bitwise operators, I feel really lost here.
--- What does this code do? if (~pointer->intX & (1 << i)) { c++; n = i; } I read somewhere that ~ inverts bits, but I fail to see what this statement is doing and why is it doing that.
(~pointer->intX & (1 << i)) is saying:
negate intX, and AND it with a 1 shifted left by i bits
so, what you're getting, if intX = 1011, and i = 2, equates to
(0100 & 0100)
-negate 1011 = 0100
-(1 << 2) = 0100
0100 & 0100 = 1 :)
then, if the AND operation returns a 1 (which, in my example, it does)
{ c++; n = i; }
so, increment c by 1, and set variable n to be i
Same with this line: row.data = ~(1 << i);
Same principle here.
Shift a 1 to the left by i places, and negate.
So, if i = 2 again
(1 << 2) = 0100
~(0100) = 1011
**--- Other question:
if (x != a) { ret |= ROW; }
What exacly is the |= operator doing? From what I read, |= is OR but i don't quite understand what is this statement doing.**
if (x != a) (hopefully this is apparent to you....if variable x does not equal variable a)
ret |= ROW;
equates to
ret = ret | ROW;
which means, binary OR ret with ROW
For examples of exactly what AND and OR operations accomplish, you should have a decent understanding of binary logic.
Check wikipedia for truth tables...ie
Bitwise operations

Related

What does 0b1 mean in C++?

I came across a part of code that I cannot understand.
for (unsigned int i = (x & 0b1); i < x; i+= 2)
{
// body
}
Here, x is from 0 to 5.
What is meant by 0b1? and what would be the answers for eg: (0 & 0b1), (4 & 0b1) etc?
0b... is a binary number, just like 0x... is hex and 0... is octal.
Thus 0b1 is same as 1.
1b0 is illegal, the first digit in those must always be 0.
As previous answers said, it is the binary representation of the integer number 1, but they don't seem to have fully answered your question. This has a lot of layers so I'll briefly explain each.
In this context, the ampersand is working as a bitwise AND operator. i & 0b1 is (sometimes) a faster way of checking if an integer is even as opposed to i % 2 == 0.
Say you have int x = 5 and you'd like to check if it's even using bitwise AND.
In binary, 5 would be represented as 0101. That final 1 actually represents the number 1, and in binary integers it's only present in odd numbers. Let's apply the bitwise AND operator to 5 and 1;
0101
0001
&----
0001
The operator is checking each column, and if both rows are 1, that column of the result will be 1 – otherwise, it will be 0. So, the result (converted back to base10) is 1. Now let's try with an even number. 4 = 0100.
0100
0001
&----
0000
The result is now equal to 0. These rules apply to every single integer no matter its size.
The higher-level layer here is that in C, there is no boolean datatype, so booleans are represented as integers of either 0 (false) or any other value (true). This allows for some tricky shorthand, so the conditional if(x & 0b1) will only run if x is odd, because odd & 0b1 will always equal 1 (true), but even & 0b1 will always equal 0 (false).

Bitwise NOT operator returning unexpected and negative value? [duplicate]

This question already has answers here:
Why is the output -33 for this code snippet
(3 answers)
Closed 9 years ago.
I'm trying to get the value of an integer using Bitwise NOT, but i'm not getting what i expected.
#include <stdio.h>
int main(){
int i = 16;
int j = ~i;
printf("%d", j);
return 0;
}
Isn't 16 supposed to be:
00000000000000000000000000010000
So ~16 is supposed to be:
11111111111111111111111111101111
Why i'm not getting what i expected and why the result is negative?
This is what i'm trying to do:
I have a number for exemple 27 which is:
00000000000000000000000000011011
And want to check every bit if it's 1 or 0.
So i need to get for exemple this value
11111111111111111111111111110111
The use second one to check if the 3rd bit of the first is set to 1.
Although there are pedantic points which can be made about compiler behaviour, the simple answer is that a signed int with the top bit set is a negative number.
So if you do something which sets the top bit of an int (a signed int, not an unsigned one), then ask the tools/library to show you the value of that int, you'll see a negative number.
This is not a universal truth, but it's a good approximation to it for most modern systems.
Note that it's printf which is making the representation here - because %d formats numbers as signed. %u may give the result you're expecting. Just changing the types of the variables won't be enough, because printf doesn't know anything about the types of its arguments.
I would say that as a general rule of thumb, if you're doing bit-twiddling, then use unsigned ints and display them in hexadecimal. Life will be simpler that way, and it most generally fits with the intent. (Fancy accelerated maths tricks are an obvious exception)
And want to check every bit if it's 1 or 0.
To check an individual bit, you don't NOT the number, you AND it with an appropriate bit mask:
if ((x & 1) != 0) ... // bit 0 is 1
if ((x & 2) != 0) ... // bit 1 is 1
if ((x & 4) != 0) ... // bit 2 is 1
if ((x & 8) != 0) ... // bit 3 is 1
...
if ((x & (1 << n)) != 0) ... // bit n is 1
...
if ((x & 0x80000000) != 0) ... // bit 31 is 1
If you want to get ones' complement of a number, you need to put that number into an unsigned variable and show it as so.
In C it would be:
unsigned int x = ~16;
printf("%u\n", x);
and you will get 4294967279.
But if you are just trying to get the negative number of a certain one, put the - operator before it.
EDIT: To check whether a bit is 0 or 1, you have to use the bitwise AND.
In two-complement arithmetic to get a reverse number (for example for value 16 to get value -16) you need reverse each bit and add 1.
In your example, to get -16 from 16 that is represented as
00000000000000000000000000010000
you need reverse each bit. You will get
11111111111111111111111111101111
Now you must add 1 and you will get
11111111111111111111111111110000
As you can see if you add these two values, you will get 0. It proves that you did all correctly.

Isolating a string of 1's in a character

I need to come up with a function which takes a char and index of a set bit in it and isolates a string of 1's containing that bit.
i.e.
char isolate(unsigned char arg, int i);
For example:
isolate(221,2) would return 28 (11011101 >>> 00011100)
isolate(221,6) would return 192 (11011101 >>> 1100000)
A lookup table seems a clumsy solution as it would require ~256*8=2048 entries.
I am thinking of examining each individual bit to the left and right of the index:
char isolate(char arg, int i)
{
char result=0;
char mask = 1<<i;
for(char mask = 1<<i; arg & mask != 0; mask>>=1)
result |= mask;
for(char mask = 1<<i; arg & mask != 0; mask<<=1)
result |= mask;
return result;
}
But it also seems a bit ugly. How can I do any better than this?
That's a funny operation. The code you've written expresses it fairly well, so would you mind elaborating on how it's ugly?
The details I can see: Given that i expresses a bit number in arg, there's absolutely no point in i being a wider type. There's never a point in writing != 0 in a condition. You probably don't want to be redeclaring mask everywhere you use it, nor initializing it twice in a row.
As for the actual spreading bit mask, I can't think of a way that's more expressive, cleaner or efficient right now.
Warning: none of this was tested or even relevant*, but it may be interesting.
Isolating the rightmost run of 1s is easy, like this: x ^ (x & ((x|(x-1))+1)) (explanation below), so let's work with that.
First x|(x-1) smears the rightmost 1 to the right, adding 1 turns all those bits to 0 including the rightmost run of 1's, anding x with removes rightmost run of 1's, and finally, xoring that with x leaves just the rightmost run of 1s.
Then we just need to make sure that the range we're looking for is the rightmost one. That's less amenable to simple bitmath, but if there's Count Leading Zeros (clz), it's not too hard:
int shift = 32 - clz(~x & ((1 << i) - 1)); //replace 32 with word size
x = (x >> shift) << shift;
((1 << i) - 1) makes a mask of the part where the right-end of the run we're looking for could be in (it could also just miss the end, but that's ok), then clz looks for the first zero to the right of i in x, then the shifts remove the bits that we don't want to look at.
Apply the first formula, for isolating the rightmost run of 1s, to the result of that to get the run of ones where i was in. i had better be in some run, or things go sideways (more accurately, it would return the first run of 1s that starts at an index higher than i)
*: For this question, none of this really matters. A 2KB table is not a clumsy solution unless you only have a tiny amount of memory available, and even if that's the case, the input is so short that the loops aren't all that bad.

printf: Displaying an SHA1 hash in hexadecimal

I have been following the msdn example that shows how to hash data using the Windows CryptoAPI. The example can be found here: http://msdn.microsoft.com/en-us/library/windows/desktop/aa382380%28v=vs.85%29.aspx
I have modified the code to use the SHA1 algorithm.
I don't understand how the code that displays the hash (shown below) in hexadecmial works, more specifically I don't understand what the >> 4 operator and the & 0xf operator do.
if (CryptGetHashParam(hHash, HP_HASHVAL, rgbHash, &cbHash, 0)){
printf("MD5 hash of file %s is: ", filename);
for (DWORD i = 0; i < cbHash; i++)
{
printf("%c%c", rgbDigits[rgbHash[i] >> 4],
rgbDigits[rgbHash[i] & 0xf]);
}
printf("\n");
}
I would be grateful if someone could explain this for me, thanks in advance :)
x >> 4 shifts x right four bits. x & 0xf does a bitwise and between x and 0xf. 0xf has its four least significant bits set, and all the other bits clear.
Assuming rgbHash is an array of unsigned char, this means the first expression retains only the four most significant bits and the second expression the four least significant bits of the (presumably) 8-bit input.
Four bits is exactly what will fit in one hexadecimal digit, so each of those is used to look up a hexadecimal digit in an array which presumably looks something like this:
char rgbDigits[] = "0123456789abcdef"; // or possibly upper-case letters
this code uses simple bit 'filtering' techniques
">> 4" means shift right by 4 places, which in turn means 'divide by 16'
"& 0xf" equals to bit AND operation which means 'take first 4 bits'
Both these values are passed to rgbDigits which proly produced output in valid range - human readable

is VAR |= 1 << 2; reverisble?

First I am not sure what is going on in this bitwise operation.
I get code written and supply to other parties as code snippets.
Now if VAR is unsigned 8bit integer (unsigned char) and r is either 0 or 1 or 2 or 4.
Can following be reversed if the value of r is known and resulting value is there.
VAR |= 1 << r; //that is 200 where VAR was 192 and r was 3
For example initial value of VAR is 192 and value of r is 3 *result is 200*.
Now if I have this 200, and I know the value of r that was 3, can I reverse it back to 192 ?
I hope it is most easy one, but I don't know these bitwise operations, so forgive me.
Thanks
The answer is no. This is because the | (OR) operator is not a one-to-one function.
In other words, there are multiple values of VAR that can produce the same result.
For example:
r = 3;
var0 = 8;
var1 = 0;
var0 |= 1 << r; // produces 8
var1 |= 1 << r; // produces 8
If you tried to invert it, you wouldn't be able to tell whether the original value is 0 or 8.
A similar situation applies to the & AND operator.
From an information-theory perspective:
The operators | and & incur a loss of information and do not preserve the entropy of the data.
On the other hand, operators such as ^ (XOR), +, and - are one-to-one and thus preserve entropy and are invertible.
No, OR is not reversable. I believe only XOR is.
For example, if variable a contains 1001 1100 or 1001 1000, and you set the third bit (from the right) to 1 regardless of what the initial value is, then both 1001 1100 and 1001 1000 as source operands would result in the same value (1001 1100).
Firstly, 1<<2 is just another way of writing "4" or 100 in binary.
The |= operator is another way of writing x = x | y;
The end result is setting bit 2 in x. If bit 2 in x was zero then reversing it would be to clear bit 2. If bit 2 was 1, then it's a no-op.
The problem with your question is that you don't know ahead of time what the initial state of bit 2 was.
If your goal was to clear bit 2 you can do this:
x &= ~(1<<2);
Given an expression result |= 1 << shiftAmount, corresponding to VAR and r in your original example, you can use the following to do the exact opposite:
result &= ~(1 << shiftAmount)
Note that this is not a pure inverse, because bitwise-or is not a one-to-one function. Bitwise-or sets one or more bits to 1, whether or not they were already 0 or 1. The expression I have shown above will always set the associated bits to 0, so if the bit was 1 originally it will not go back to its original state.
No, you can't reverse an OR operation.
In your example, with r=3, both the starting values VAR=192 and VAR=200 will result in 200.
Since there are two input values that will give the same result, you won't know which one to go back to.