printf: Displaying an SHA1 hash in hexadecimal - c++

I have been following the msdn example that shows how to hash data using the Windows CryptoAPI. The example can be found here: http://msdn.microsoft.com/en-us/library/windows/desktop/aa382380%28v=vs.85%29.aspx
I have modified the code to use the SHA1 algorithm.
I don't understand how the code that displays the hash (shown below) in hexadecmial works, more specifically I don't understand what the >> 4 operator and the & 0xf operator do.
if (CryptGetHashParam(hHash, HP_HASHVAL, rgbHash, &cbHash, 0)){
printf("MD5 hash of file %s is: ", filename);
for (DWORD i = 0; i < cbHash; i++)
{
printf("%c%c", rgbDigits[rgbHash[i] >> 4],
rgbDigits[rgbHash[i] & 0xf]);
}
printf("\n");
}
I would be grateful if someone could explain this for me, thanks in advance :)

x >> 4 shifts x right four bits. x & 0xf does a bitwise and between x and 0xf. 0xf has its four least significant bits set, and all the other bits clear.
Assuming rgbHash is an array of unsigned char, this means the first expression retains only the four most significant bits and the second expression the four least significant bits of the (presumably) 8-bit input.
Four bits is exactly what will fit in one hexadecimal digit, so each of those is used to look up a hexadecimal digit in an array which presumably looks something like this:
char rgbDigits[] = "0123456789abcdef"; // or possibly upper-case letters

this code uses simple bit 'filtering' techniques
">> 4" means shift right by 4 places, which in turn means 'divide by 16'
"& 0xf" equals to bit AND operation which means 'take first 4 bits'
Both these values are passed to rgbDigits which proly produced output in valid range - human readable

Related

Convering Big Endian Formatted Bits to Intended Decimal Value While Ignoring First Bit

I am a reading binary file and trying to convert from IBM 4 Byte floating point to double in C++. How exactly would one use the first byte of IBM data to find the ccccccc in the given picture
IBM to value conversion chart
The code below gives an exponent way larger than what the data should have. I am confused with how the line
exponent = ((IBM4ByteValue[0] & 127) - 64);
executes, I do not understand the use of the & operator in this statement. But essentially what the previous author of this code implied is that (IBM4ByteValue[0]) is the ccccccc , so does this mean that the ampersand sets a maximum value that the left side of the operator can equal? Even if this is correct though I'm sure how this line accounts for the fact that there Big Endian bitwise notation in the first byte (I believe it is Big Endian after viewing the picture). Not to mention 1000001 and 0000001 should have the same exponent (-63) however they will not with my current interpretation of the previously mentioned line.
So in short could someone show me how to find the ccccccc (shown in the picture link above) using the first byte --> IBM4ByteValue[0]. Maybe accessing each individual bit? However I do not know the code to do this using my array.
**this code is using the std namespace
**I believe ret should be mantissa * pow(16, 24+exponent) however if I'm wrong about the exponent I'm probable wrong about this (I got the IBM Conversion from a previously asked stackoverflow question) **I would have just commented on the old post, but this question was a bit too large, pun intended, for a comment. It is also different in that I am asking how exactly one accesses the bits in an array storing whole bytes.
Code I put together using an IBM conversion from previous question answer
for (long pos = 0; pos < fileLength; pos += BUF_LEN) {
file.seekg(bytePosition);
file.read((char *)(&IBM4ByteValue[0]), BUF_LEN);
bytePosition += 4;
printf("\n%8ld: ", pos);
//IBM Conversion
double ret = 0;
uint32_t mantissa = 0;
uint16_t exponent = 0;
mantissa = (IBM4ByteValue[3] << 16) | (IBM4ByteValue[2] << 8)|IBM4ByteValue[1];
exponent = ((IBM4ByteValue[0] & 127) - 64);
ret = mantissa * exp2(-24 + 4 * exponent);
if (IBM4ByteValue[0] & 128) ret *= -1.;
printf(":%24f", ret);
printf("\n");
system("PAUSE");
}
The & operator basically takes the bits in that value of the array and masks it with the binary value of 127. If a bit in the value of the array is 1, and the corresponding bit position of 127 is 1, the bit will be a resulting 1. 1 & 0 would be 0, and so would 0 & 0 , and 0 & 1. You would be changing the bits. Then you would take the resulting bit value, converted to decimal now, and subtract 64 from it to equal your exponent.
In floating point we always have a bias (in this case, 64) for the exponent. This means that if your exponent is 5, 69 will be stored. So what this code is trying to do is find the original value of the exponent.

General algorithm for reading n bits and padding with zeros

I need a function to read n bits starting from bit x(bit index should start from zero), and if the result is not byte aligned, pad it with zeros. The function will receive uint8_t array on the input, and should return uint8_t array as well. For example, I have file with following contents:
1011 0011 0110 0000
Read three bits from the third bit(x=2,n=3); Result:
1100 0000
There's no (theoretical) limit on input and bit pattern lengths
Implementing such a bitfield extraction efficiently without beyond the direct bit-serial algorithm isn't precisely hard but a tad cumbersome.
Effectively it boils down to an innerloop reading a pair of bytes from the input for each output byte, shifting the resulting word into place based on the source bit-offset, and writing back the upper or lower byte. In addition the final output byte is masked based on the length.
Below is my (poorly-tested) attempt at an implementation:
void extract_bitfield(unsigned char *dstptr, const unsigned char *srcptr, size_t bitpos, size_t bitlen) {
// Skip to the source byte covering the first bit of the range
srcptr += bitpos / CHAR_BIT;
// Similarly work out the expected, inclusive, final output byte
unsigned char *endptr = &dstptr[bitlen / CHAR_BIT];
// Truncate the bit-positions to offsets within a byte
bitpos %= CHAR_BIT;
bitlen %= CHAR_BIT;
// Scan through and write out a correctly shifted version of every destination byte
// via an intermediate shifter register
unsigned long accum = *srcptr++;
while(dstptr <= endptr) {
accum = accum << CHAR_BIT | *srcptr++;
*dstptr++ = accum << bitpos >> CHAR_BIT;
}
// Mask out the unwanted LSB bits not covered by the length
*endptr &= ~(UCHAR_MAX >> bitlen);
}
Beware that the code above may read past the end of the source buffer and somewhat messy special handling is required if you can't set up the overhead to allow this. It also assumes sizeof(long) != 1.
Of course to get efficiency out of this you will want to use as wide of a native word as possible. However if the target buffer necessarily word-aligned then things get even messier. Furthermore little-endian systems will need byte swizzling fix-ups.
Another subtlety to take heed of is the potential inability to shift a whole word, that is shift counts are frequently interpreted modulo the word length.
Anyway, happy bit-hacking!
Basically it's still a bunch of shift and addition operations.
I'll use a slightly larger example to demonstrate this.
Suppose we are give an input of 4 characters, and x = 10, n = 18.
00101011 10001001 10101110 01011100
First we need to locate the character contains our first bit, by x / 8, which gives us 1 (the second character) in this case. We also need the offset in that character, by x % 8, which equals to 2.
Now we can get out first character of the solution in three operations.
Left shift the second character 10001001 with 2 bits, gives us 00100100.
Right shift the third character 10101110 with 6 (comes from 8 - 2) bits, gives us 00000010.
Add these two characters gives us the first character in your return string, gives 00100110.
Loop this routine for n / 8 rounds. And if n % 8 is not 0, extract that many bits from the next character, you can do it in many approaches.
So in this example, our second round will give us 10111001, and the last step we get 10, then pad the rest bits with 0s.

Shift left/right adding zeroes/ones and dropping first bits

I've got to program a function that receives
a binary number like 10001, and
a decimal number that indicates how many shifts I should perform.
The problem is that if I use the C++ operator <<, the zeroes are pushed from behind but the first numbers aren't dropped... For example
shifLeftAddingZeroes(10001,1)
returns 100010 instead of 00010 that is what I want.
I hope I've made myself clear =P
I assume you are storing that information in int. Take into consideration, that this number actually has more leading zeroes than what you see, ergo your number is most likely 16 bits, meaning 00000000 00000001 . Maybe try AND-ing it with number having as many 1 as the number you want to have after shifting? (Assuming you want to stick to bitwise operations).
What you want is to bit shift and then limit the number of output bits which can be active (hold a value of 1). One way to do this is to create a mask for the number of bits you want, then AND the bitshifted value with that mask. Below is a code sample for doing that, just replace int_type with the type of value your using -- or make it a template type.
int_type shiftLeftLimitingBitSize(int_type value, int numshift, int_type numbits=some_default) {
int_type mask = 0;
for (unsigned int bit=0; bit < numbits; bit++) {
mask += 1 << bit;
}
return (value << numshift) & mask;
}
Your output for 10001,1 would now be shiftLeftLimitingBitSize(0b10001, 1, 5) == 0b00010.
Realize that unless your numbits is exactly the length of your integer type, you will always have excess 0 bits on the 'front' of your number.

What is this doing: "input >> 4 & 0x0F"?

I don't understand what this code is doing at all, could someone please explain it?
long input; //just here to show the type, assume it has a value stored
unsigned int output( input >> 4 & 0x0F );
Thanks
bitshifts the input 4 bits to the right, then masks by the lower 4 bits.
Take this example 16 bit number: (the dots are just for visual separation)
1001.1111.1101.1001 >> 4 = 0000.1001.1111.1101
0000.1001.1111.1101 & 0x0F = 1101 (or 0000.0000.0000.1101 to be more explicit)
& is the bitwise AND operator. "& 0x0F" is sometimes done to pad the first 4 bits with 0s, or ignore the first(leftmost) 4 bits in a value.
0x0f = 00001111. So a bitwise & operation of 0x0f with any other bit pattern will retain only the rightmost 4 bits, clearing the left 4 bits.
If the input has a value of 01010001, after doing &0x0F, we'll get 00000001 - which is a pattern we get after clearing the left 4 bits.
Just as another example, this is a code I've used in a project:
Byte verflag = (Byte)(bIsAck & 0x0f) | ((version << 4) & 0xf0). Here I'm combining two values into a single Byte value to save space because it's being used in a packet header structure. bIsAck is a BOOL and version is a Byte whose value is very small. So both these values can be contained in a single Byte variable.
The first nibble in the resultant variable will contain the value of version and the second nibble will contain the value of bIsAck. I can retrieve the values into separate variables at the receiving by doing a 4 bits >> while taking the value of version.
Hope this is somewhere near to what you asked for.
That is doing a bitwise right shift the contents of "input" by 4 bits, then doing a bitwise AND of the result with 0x0F (1101).
What it does depends on the contents and type of "input". Is it an int? A long? A string (which would mean the shift and bitwise AND are being done on a pointer to the first byte).
Google for "c++ bitwise operations" for more details on what's going on under the hood.
Additionally, look at C++ operator precedence because the C/C++ precedence is not exactly the same as in many other languages.

Some random C questions (ascii magic and bitwise operators)

I am trying to learn C programming, and I was studying some source codes and there are some things I didn't understand, especially regarding Bitwise Operators. I read some sites on this, and I kinda got an idea on what they do, but when I went back to look at this codes, I could not understand why and how where they used.
My first question is not related to bitwise operators but rather some ascii magic:
Can somebody explain to me how the following code works?
char a = 3;
int x = a - '0';
I understand this is done to convert a char into an int, however I don't understand the logic behind it. Why/How does it work?
Now, Regarding Bitwise operators, I feel really lost here.
What does this code do?
if (~pointer->intX & (1 << i)) { c++; n = i; }
I read somewhere that ~ inverts bits, but I fail to see what this statement is doing and why is it doing that.
Same with this line:
row.data = ~(1 << i);
Other question:
if (x != a)
{
ret |= ROW;
}
What exactly is the |= operator doing? From what I read, |= is OR but i don't quite understand what is this statement doing.
Is there any way of rewriting this code to make it easier to understands so that it doesn't use this bitwise operators? I find them very confusing to understand, so hopefully somebody will point me in the right direction on understanding how they work better!
I have a much better understanding of bitwise operators now and the whole code makes much more sense now.
One last thing: appartenly nobody responded if there would be a "cleaner" way for rewriting this code in a way that its easier to understand and maybe not at "bitlevel". Any ideas?
This will produce junk:
char a = 3;
int x = a - '0';
This is different - note the quotes:
char a = '3';
int x = a - '0';
The char datatype stores a number that identifiers a character. The characters for the digits 0 through 9 are all next to each other in the character code list, so if you subtract the code for '0' from the code for '9', you get the answer 9. So this will turn a digit character code into the integer value of the digit.
(~pointer->intX & (1 << i))
That will be interpreted by the if statement as true if it's non-zero. There are three different bitwise operators being used.
The ~ operator flips all the bits in the number, so if pointer->intX was 01101010, then ~pointer->intX will be 10010101. (Note that throughout, I'm illustrating the contents of a byte. If it was a 32-bit integer, I'd have to write 32 digits of 1s and 0s).
The & operator combines two numbers into one number, by dealing with each bit separately. The resulting bit is only 1 if both the input bits are 1. So if the left side is 00101001 and the right side is 00001011, the result will be 00001001.
Finally, << means left shift. If you start with 00000001 and left shift it by three places, you'll have 00001000. So the expression (1 << i) produces a value where bit i is switched on, and the others are all switch off.
Putting it all together, it tests if bit i is switched off (zero) in pointer->intX.
So you may be able to figure out what ~(1 << i) does. If i is 4, the thing in brackets will be 00010000, and so the whole thing will be 11101111.
ret |= ROW;
That one is equivalent to:
ret = ret | ROW;
The | operator is like & except that the resulting bit is 1 if either of the input bits is 1. So if ret is 00100000 and ROW is 00000010, the result will be 00100010.
ret |= ROW;
is equivalent to
ret = ret | ROW;
For char a = 3; int x = a - '0'; I think you meant char a = '3'; int x = a - '0';. It's easy enough to understand if you realize that in ASCII the numbers come in order, like '0', '1', '2', ... So if '0' is 48 and '1' is 49, then '1' - '0' is 1.
For bitwise operations, they are hard to grasp until you start looking at bits. When you view these operations on binary numbers then you can see exactly how they work...
010 & 111 = 010
010 | 111 = 111
010 ^ 111 = 101
~010 = 101
I think you probably have a typo, and meant:
char a = '3';
The reason this works is that all the numbers come in order, and '0' is the first. Obviously, '0' - '0' = 0. '1' - '0' = 1, since the character value for '1' is one greater than the character value for '0'. Etc.
1) A char is really just a 8-bit integer. '0' == 48, and all that that implies.
2) (~(pointer->intX) & (1 << i)) evalutates whether the 'i'th bit (from the right) in the intX member of whatever pointer points to is not set. The ~ inverts the bits, so all the 0s become 1s and vice versa, then the 1 << i puts a single 1 in the desired location, & combines the two values so that only the desired bit is kept, and the whole thing evalutes to true if that bit was 0 to begin with.
3) | is bitwise or. It takes each bit in both operands and performs a logical OR, producing a result where each bit is set if either operand had that bit set. 0b11000000 | 0b00000011 == 0b11000011. |= is an assignment operator, in the same way that a+=b means a=a+b, a|=b means a=a|b.
Not using bitwise operators CAN make things easier to read in some cases, but it will usually also make your code significantly slower without strong compiler optimization.
The subtraction trick you reference works because ASCII numbers are arranged in ascending order, starting with zero. So if ASCII '0' is a value of 48 (and it is), then '1' is a value of 49, '2' is 50, etc. Therefore ASCII('1') - ASCII('0') = 49 - 48 = 1.
As far as bitwise operators go, they allow you to perform bit-level operations on variables.
Let's break down your example:
(1 << i) -- this is left-shifting the constant 1 by i bits. So if i=0, the result is decimal 1. If i = 1, it shifts the bit one to the left, backfilling with zeros, yielding binary 0010, or decimal 2. If i = 2, you shift the bit two to the left, backfilling with zeros, yielding binary 0100 or decimal 4, etc.
~pointer->intX -- this is taking the value of the intX member of pointer and inverting its bits, setting all zeros to ones and vice versa.
& -- the ampersand operator does a bitwise AND comparison. The results of this will be 1 wherever both the left and right side of the expression are 1, and 0 otherwise.
So the test will succeed if pointer->intX has a 0 bit at the ith position from the right.
Also, |= means to do a bitwise OR comparison and assign the result to the left side of the expression. The result of a bitwise OR is 1 for every bit where the corresponding left or right side bit is 1,
Single quotes are used to indicate that a single char is used. '0' therefore is the char '0', which has the ASCII-Code 48.
3-'0'=3-48
'1<<i' shifts 1 i places to the left, therefore only the ith bit from the right is 1.
~pointer->intX negates the field intX, so the logical AND returns a true value (not 0) when intX has every bit except for the ith bit from the right isn't set.
char a = '3';
int x = a - '0';
you had a typo here (notice the 's around the 3), this assigns the ascii value of the character 3, to the char variable, then the next line takes '3' - '0' and assigns it to x, because of the way ascii values work, x will then be equal to 3 (integer value)
In the first comparison, I've never seen ~ being used on a pointer that way before, another typo maybe? If I were to read out the following code:
(~pointer->intX & (1 << i))
I would say "(the value intX dereferenced from pointer) AND (1 left shifted i times)"
1 << i is a quick way of multiplying 1 by a power of 2, ie if i is 3, then 1 << 3 == 8
In this case, I have no clue why you would invert the bits of the pointer..
In the 2nd comparison, x |= y is the same as x = x | y
I'm assuming you mean char a='3'; for the first line of code (otherwise you get a rather strange answer). The basic principal is that ASCII codes for digits are sequential, i.e. the code for '0'=48, the code for '1'=49, and so on. Subtracting '0' simply converts from the ASCII code to the actual digit, so e.g. '3' - '0' = 3, and so on. Note that this will only work if the character you're subtracting '0' from is an actual digit - otherwise the result will have little meaning.
a. Without context the "why" of this code is impossible to say. As for what it's doing, it appears that the if statement evaluates as true when bit i of pointer->intX is not set, i.e. that particular bit is a 0. I believe the & operator gets executed before the ~ operator, as the ~ operator has very low precedence. The code could make better use of parentheses to make the intended order of operations clearer. In this case, the order of operations might not matter though - I believe the result is the same either way.
b. This is simply creating a number with all bits EXCEPT bit i set to 1. A convenient way of creating a mask for bit i is to use the expression (1<<i).
The bitwise OR operation in this case is used to set the bits specified by the ROW constant to 1. If these bits are not set, it sets them; if they're already set it has no effect.
1) Can somebody explain to me how the following code works? char a = 3; int x = a - '0';
I undertand this is done to convert a char into an int, however I don't understand the logic behind it. Why/How does it work?
Sure. variable a is of type char, and by putting single quotes around 0 that is causing C to view it as a char as well. Finally, the whole statement is automagically typecast to its integer equivalent, because x is defined as an integer.
2) Now, Regarding Bitwise operators, I feel really lost here.
--- What does this code do? if (~pointer->intX & (1 << i)) { c++; n = i; } I read somewhere that ~ inverts bits, but I fail to see what this statement is doing and why is it doing that.
(~pointer->intX & (1 << i)) is saying:
negate intX, and AND it with a 1 shifted left by i bits
so, what you're getting, if intX = 1011, and i = 2, equates to
(0100 & 0100)
-negate 1011 = 0100
-(1 << 2) = 0100
0100 & 0100 = 1 :)
then, if the AND operation returns a 1 (which, in my example, it does)
{ c++; n = i; }
so, increment c by 1, and set variable n to be i
Same with this line: row.data = ~(1 << i);
Same principle here.
Shift a 1 to the left by i places, and negate.
So, if i = 2 again
(1 << 2) = 0100
~(0100) = 1011
**--- Other question:
if (x != a) { ret |= ROW; }
What exacly is the |= operator doing? From what I read, |= is OR but i don't quite understand what is this statement doing.**
if (x != a) (hopefully this is apparent to you....if variable x does not equal variable a)
ret |= ROW;
equates to
ret = ret | ROW;
which means, binary OR ret with ROW
For examples of exactly what AND and OR operations accomplish, you should have a decent understanding of binary logic.
Check wikipedia for truth tables...ie
Bitwise operations