Shift left/right adding zeroes/ones and dropping first bits - c++

I've got to program a function that receives
a binary number like 10001, and
a decimal number that indicates how many shifts I should perform.
The problem is that if I use the C++ operator <<, the zeroes are pushed from behind but the first numbers aren't dropped... For example
shifLeftAddingZeroes(10001,1)
returns 100010 instead of 00010 that is what I want.
I hope I've made myself clear =P

I assume you are storing that information in int. Take into consideration, that this number actually has more leading zeroes than what you see, ergo your number is most likely 16 bits, meaning 00000000 00000001 . Maybe try AND-ing it with number having as many 1 as the number you want to have after shifting? (Assuming you want to stick to bitwise operations).

What you want is to bit shift and then limit the number of output bits which can be active (hold a value of 1). One way to do this is to create a mask for the number of bits you want, then AND the bitshifted value with that mask. Below is a code sample for doing that, just replace int_type with the type of value your using -- or make it a template type.
int_type shiftLeftLimitingBitSize(int_type value, int numshift, int_type numbits=some_default) {
int_type mask = 0;
for (unsigned int bit=0; bit < numbits; bit++) {
mask += 1 << bit;
}
return (value << numshift) & mask;
}
Your output for 10001,1 would now be shiftLeftLimitingBitSize(0b10001, 1, 5) == 0b00010.
Realize that unless your numbits is exactly the length of your integer type, you will always have excess 0 bits on the 'front' of your number.

Related

Get the low portion of a number of any of the built-in types [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
How would I create a function template which returns the low portion of a number of N bits?
For example, for an 8 bit number, get the least significant 4 bits, for a 16 bit number, get the least significant 8 bits.
To get the lower half of a built-in integer type you can try something like this:
#include <iostream>
#include <climits>
using std::cout;
using std::endl;
template<typename T>
constexpr T lowbits(T v) {
return v & (T(1) << CHAR_BIT * sizeof v / 2) - 1;
}
int main() {
cout << std::hex << (int)lowbits<int8_t>(0xde) << endl; // will print e
cout << std::hex << lowbits<int16_t>(0xdead) << endl; // will print ad
cout << std::hex << lowbits<int32_t>(0xdeadbeef) << endl; // will print beef
cout << std::hex << lowbits<int64_t>(0xbeefdeaddeadbeef) << endl; // will print deadbeef
}
Note that
return v & (T(1) << CHAR_BIT * sizeof v / 2) - 1;
is equivalent to:
return v & (
(static_cast<T>(1)
<<
(CHAR_BIT * (sizeof v) / 2)) // number of bits divided by 2
- 1
);
In essence you are creating a bit-mask (simply another integer) that has 0-bits for all higher bits and 1-bits for all lower bits.
If an integer type has N bits this is done by shifting a 1-bit into the Nth position and then subtracting 1 from it. The subtraction has the result that all bits below the 1 will be set.
And-ing this with the given value yields only the lower half of the value v.
You can easily generalize this approach to retrieving any number of lower bits by replacing CHAR_BIT * sizeof v/2 with the number of bits you want to retrieve.
To get only the higher bits you can simply negate the resulting mask using the ~ operator.
If you require arbitrary sized integers you can try finding the equivalent operations for this procedure in the GNU gmp library.
Let us define a variable called mask which is the pattern to mask off (or retain) some bits. The operation to get the least significant bits is:
result = value & mask;
For an example, test with value == 13 and mask == 7.
This works will all POD types, except for floating point. The least significant Q bits of a floating point, doesn't make sense (unless you really need to do this).
If you have no need for more bits than the largest internal integral type, you could use something like this:
template <typename T>
T low_bits(T data, size_t bit_count)
{
T mask = (1U << bit_count) - 1U;
return value & mask;
}
For a non-template solution, one could use a macro:
#define LOW_BITS(value, bit_count) \
(value & ((1U << bit_count) - 1U))
This lets the compiler figure out the code based on the data type of value.
A macro form of the expression: value & mask.
The thorn or issue comes into play when N > sizeof(*largest type*). In this case, the number can't be represented by internal data types, so one has to come up with a different solution.
The solution for N-bit depends on whether the multi-byte representation of the number is Big Endian or Little Endian. For Big Endian platforms, the least significant value will be at highest address, while on Little Endian platforms, the least significant is at the lowest address.
The solution I'm proposing treats the N-bit number as an array of bytes. A byte contains 8-bits (on most platforms), and bytes can be masked differently than multibyte quantities.
Here's the algorithm:
1. Copy the least significant bytes that are completely masked to the result variable.
2. Mask the next largest byte and copy result byte to result number.
3. Pad remaining bytes with 0.
As far as the function parameters go, you'll need:
1) Pointer to the memory location of the original number.
2) Pointer to the result number.
3) Pointer to the mask.
4) Size of the number, in bytes.
The algorithm can handle N-bit numbers, limited by the amount of memory on the platform.
Note: sorry about not providing code, but I need to get back to work. :-(

Convering Big Endian Formatted Bits to Intended Decimal Value While Ignoring First Bit

I am a reading binary file and trying to convert from IBM 4 Byte floating point to double in C++. How exactly would one use the first byte of IBM data to find the ccccccc in the given picture
IBM to value conversion chart
The code below gives an exponent way larger than what the data should have. I am confused with how the line
exponent = ((IBM4ByteValue[0] & 127) - 64);
executes, I do not understand the use of the & operator in this statement. But essentially what the previous author of this code implied is that (IBM4ByteValue[0]) is the ccccccc , so does this mean that the ampersand sets a maximum value that the left side of the operator can equal? Even if this is correct though I'm sure how this line accounts for the fact that there Big Endian bitwise notation in the first byte (I believe it is Big Endian after viewing the picture). Not to mention 1000001 and 0000001 should have the same exponent (-63) however they will not with my current interpretation of the previously mentioned line.
So in short could someone show me how to find the ccccccc (shown in the picture link above) using the first byte --> IBM4ByteValue[0]. Maybe accessing each individual bit? However I do not know the code to do this using my array.
**this code is using the std namespace
**I believe ret should be mantissa * pow(16, 24+exponent) however if I'm wrong about the exponent I'm probable wrong about this (I got the IBM Conversion from a previously asked stackoverflow question) **I would have just commented on the old post, but this question was a bit too large, pun intended, for a comment. It is also different in that I am asking how exactly one accesses the bits in an array storing whole bytes.
Code I put together using an IBM conversion from previous question answer
for (long pos = 0; pos < fileLength; pos += BUF_LEN) {
file.seekg(bytePosition);
file.read((char *)(&IBM4ByteValue[0]), BUF_LEN);
bytePosition += 4;
printf("\n%8ld: ", pos);
//IBM Conversion
double ret = 0;
uint32_t mantissa = 0;
uint16_t exponent = 0;
mantissa = (IBM4ByteValue[3] << 16) | (IBM4ByteValue[2] << 8)|IBM4ByteValue[1];
exponent = ((IBM4ByteValue[0] & 127) - 64);
ret = mantissa * exp2(-24 + 4 * exponent);
if (IBM4ByteValue[0] & 128) ret *= -1.;
printf(":%24f", ret);
printf("\n");
system("PAUSE");
}
The & operator basically takes the bits in that value of the array and masks it with the binary value of 127. If a bit in the value of the array is 1, and the corresponding bit position of 127 is 1, the bit will be a resulting 1. 1 & 0 would be 0, and so would 0 & 0 , and 0 & 1. You would be changing the bits. Then you would take the resulting bit value, converted to decimal now, and subtract 64 from it to equal your exponent.
In floating point we always have a bias (in this case, 64) for the exponent. This means that if your exponent is 5, 69 will be stored. So what this code is trying to do is find the original value of the exponent.

General algorithm for reading n bits and padding with zeros

I need a function to read n bits starting from bit x(bit index should start from zero), and if the result is not byte aligned, pad it with zeros. The function will receive uint8_t array on the input, and should return uint8_t array as well. For example, I have file with following contents:
1011 0011 0110 0000
Read three bits from the third bit(x=2,n=3); Result:
1100 0000
There's no (theoretical) limit on input and bit pattern lengths
Implementing such a bitfield extraction efficiently without beyond the direct bit-serial algorithm isn't precisely hard but a tad cumbersome.
Effectively it boils down to an innerloop reading a pair of bytes from the input for each output byte, shifting the resulting word into place based on the source bit-offset, and writing back the upper or lower byte. In addition the final output byte is masked based on the length.
Below is my (poorly-tested) attempt at an implementation:
void extract_bitfield(unsigned char *dstptr, const unsigned char *srcptr, size_t bitpos, size_t bitlen) {
// Skip to the source byte covering the first bit of the range
srcptr += bitpos / CHAR_BIT;
// Similarly work out the expected, inclusive, final output byte
unsigned char *endptr = &dstptr[bitlen / CHAR_BIT];
// Truncate the bit-positions to offsets within a byte
bitpos %= CHAR_BIT;
bitlen %= CHAR_BIT;
// Scan through and write out a correctly shifted version of every destination byte
// via an intermediate shifter register
unsigned long accum = *srcptr++;
while(dstptr <= endptr) {
accum = accum << CHAR_BIT | *srcptr++;
*dstptr++ = accum << bitpos >> CHAR_BIT;
}
// Mask out the unwanted LSB bits not covered by the length
*endptr &= ~(UCHAR_MAX >> bitlen);
}
Beware that the code above may read past the end of the source buffer and somewhat messy special handling is required if you can't set up the overhead to allow this. It also assumes sizeof(long) != 1.
Of course to get efficiency out of this you will want to use as wide of a native word as possible. However if the target buffer necessarily word-aligned then things get even messier. Furthermore little-endian systems will need byte swizzling fix-ups.
Another subtlety to take heed of is the potential inability to shift a whole word, that is shift counts are frequently interpreted modulo the word length.
Anyway, happy bit-hacking!
Basically it's still a bunch of shift and addition operations.
I'll use a slightly larger example to demonstrate this.
Suppose we are give an input of 4 characters, and x = 10, n = 18.
00101011 10001001 10101110 01011100
First we need to locate the character contains our first bit, by x / 8, which gives us 1 (the second character) in this case. We also need the offset in that character, by x % 8, which equals to 2.
Now we can get out first character of the solution in three operations.
Left shift the second character 10001001 with 2 bits, gives us 00100100.
Right shift the third character 10101110 with 6 (comes from 8 - 2) bits, gives us 00000010.
Add these two characters gives us the first character in your return string, gives 00100110.
Loop this routine for n / 8 rounds. And if n % 8 is not 0, extract that many bits from the next character, you can do it in many approaches.
So in this example, our second round will give us 10111001, and the last step we get 10, then pad the rest bits with 0s.

Why is the binary equivalent calculation getting incorrect?

I wrote the following program to output the binary equivalent of a integer taking(I checked that int on my system is of 4 bytes) it is of 4 bytes. But the output doesn't come the right. The code is:
#include<iostream>
#include<iomanip>
using namespace std;
void printBinary(int k){
for(int i = 0; i <= 31; i++){
if(k & ((1 << 31) >> i))
cout << "1";
else
cout << "0";
}
}
int main(){
printBinary(12);
}
Where am I getting it wrong?
The problem is in 1<<31. Because 231 cannot be represented with a 32-bit signed integer (range −231 to 231 − 1), the result is undefined [1].
The fix is easy: 1U<<31.
[1]: The behavior is implementation-defined since C++14.
This expression is incorrect:
if(k & ((1<<31)>>i))
int is a signed type, so when you shift 1 31 times, it becomes the sign bit on your system. After that, shifting the result right i times sign-extends the number, meaning that the top bits remain 1s. You end up with a sequence that looks like this:
80000000 // 10000...00
C0000000 // 11000...00
E0000000 // 11100...00
F0000000 // 11110...00
F8000000
FC000000
...
FFFFFFF8
FFFFFFFC
FFFFFFFE // 11111..10
FFFFFFFF // 11111..11
To fix this, replace the expression with 1 & (k>>(31-i)). This way you would avoid undefined behavior* resulting from shifting 1 to the sign bit position.
* C++14 changed the definition so that shifting 1 31 times to the left in a 32-bit int is no longer undefined (Thanks, Matt McNabb, for pointing this out).
A typical internal memory representation of a signed integer value looks like:
The most significant bit (first from the right) is the sign bit and in signed numbers(like int) it represents whether the number is negative or not.
When you shift additional bits sign extension is performed to preserve the number's sign. This is done by appending digits to the most significant side of the number.(following a procedure dependent on the particular signed number representation used).
In unsigned numbers the first bit from the right is just the MSB of the represented number, thus when you shift additional bits no sign extension is performed.
Note: the enumeration of the bits starts from 0, so 1 << 31 replaces your sign bit and after that every bit shift operation to the left >> results in sign extension. (as pointed out by #dasblinkenlight)
So, the simple solution to your problem is to make the number unsigned (this is what U does in 1U << 31) before you start the bit manipulation. (as pointed out by #Yu Hao)
For further reading see signed number representations and two's complement.(as it's the most common)

printf: Displaying an SHA1 hash in hexadecimal

I have been following the msdn example that shows how to hash data using the Windows CryptoAPI. The example can be found here: http://msdn.microsoft.com/en-us/library/windows/desktop/aa382380%28v=vs.85%29.aspx
I have modified the code to use the SHA1 algorithm.
I don't understand how the code that displays the hash (shown below) in hexadecmial works, more specifically I don't understand what the >> 4 operator and the & 0xf operator do.
if (CryptGetHashParam(hHash, HP_HASHVAL, rgbHash, &cbHash, 0)){
printf("MD5 hash of file %s is: ", filename);
for (DWORD i = 0; i < cbHash; i++)
{
printf("%c%c", rgbDigits[rgbHash[i] >> 4],
rgbDigits[rgbHash[i] & 0xf]);
}
printf("\n");
}
I would be grateful if someone could explain this for me, thanks in advance :)
x >> 4 shifts x right four bits. x & 0xf does a bitwise and between x and 0xf. 0xf has its four least significant bits set, and all the other bits clear.
Assuming rgbHash is an array of unsigned char, this means the first expression retains only the four most significant bits and the second expression the four least significant bits of the (presumably) 8-bit input.
Four bits is exactly what will fit in one hexadecimal digit, so each of those is used to look up a hexadecimal digit in an array which presumably looks something like this:
char rgbDigits[] = "0123456789abcdef"; // or possibly upper-case letters
this code uses simple bit 'filtering' techniques
">> 4" means shift right by 4 places, which in turn means 'divide by 16'
"& 0xf" equals to bit AND operation which means 'take first 4 bits'
Both these values are passed to rgbDigits which proly produced output in valid range - human readable