Bitwise operations on a short - bit-manipulation

I'm using a short (and I must use a short for the assignment otherwise I would just use an int) to scan in a value between 0-31 and then using a single integer to store 6 of these scanned values.
This is what I have so far:
int vals = 0;
short ndx, newVal;
/* more printing/scanning and error checking in between */
newVal = newVal << (5*ndx);
vals = vals | newVal;
When I try to place a valid value at spot 4 or 5 it doesn't work and just stays 0... I'm wondering if this is because a short is only 2 bytes long so the bitwise left shift just gets rid of the entire value? And if this is the problem is there some sort of cast I can add to fix it?

It's exactly what you thought. You used a bitwise-shift, and then assigned the result into a short variable (newVal). When you do that, even if the calculation is done in 32-bit, the result still gets truncated, and you only get the least significant 16 bits of 0s.
If you want to refrain from using an int, just drop the newVal variable completely, and calculate vals = vals | ((something) << (some other thing));

Related

How to build N bits variables in C++?

I am dealing with very large list of booleans in C++, around 2^N items of N booleans each. Because memory is critical in such situation, i.e. an exponential growth, I would like to build a N-bits long variable to store each element.
For small N, for example 24, I am just using unsigned long int. It takes 64MB ((2^24)*32/8/1024/1024). But I need to go up to 36. The only option with build-in variable is unsigned long long int, but it takes 512GB ((2^36)*64/8/1024/1024/1024), which is a bit too much.
With a 36-bits variable, it would work for me because the size drops to 288GB ((2^36)*36/8/1024/1024/1024), which fits on a node of my supercomputer.
I tried std::bitset, but std::bitset< N > creates a element of at least 8B.
So a list of std::bitset< 1 > is much greater than a list of unsigned long int.
It is because the std::bitset just change the representation, not the container.
I also tried boost::dynamic_bitset<> from Boost, but the result is even worst (at least 32B!), for the same reason.
I know an option is to write all elements as one chain of booleans, 2473901162496 (2^36*36), then to store then in 38654705664 (2473901162496/64) unsigned long long int, which gives 288GB (38654705664*64/8/1024/1024/1024). Then to access an element is just a game of finding in which elements the 36 bits are stored (can be either one or two). But it is a lot of rewriting of the existing code (3000 lines) because mapping becomes impossible and because adding and deleting items during the execution in some functions will be surely complicated, confusing, challenging, and the result will be most likely not efficient.
How to build a N-bits variable in C++?
How about a struct with 5 chars (and perhaps some fancy operator overloading as needed to keep it compatible to the existing code)? A struct with a long and a char probably won't work because of padding / alignment...
Basically your own mini BitSet optimized for size:
struct Bitset40 {
unsigned char data[5];
bool getBit(int index) {
return (data[index / 8] & (1 << (index % 8))) != 0;
}
bool setBit(int index, bool newVal) {
if (newVal) {
data[index / 8] |= (1 << (index % 8));
} else {
data[index / 8] &= ~(1 << (index % 8));
}
}
};
Edit: As geza has also pointed out int he comments, the "trick" here is to get as close as possible to the minimum number of bytes needed (without wasting memory by triggering alignment losses, padding or pointer indirection, see http://www.catb.org/esr/structure-packing/).
Edit 2: If you feel adventurous, you could also try a bit field (and please let us know how much space it actually consumes):
struct Bitset36 {
unsigned long long data:36;
}
I'm not an expert, but this is what I would "try". Find the bytes for the smallest type your compiler supports (should be char). You can check with sizeof and you should get 1. That means 1 byte, so 8 bits.
So if you wanted a 24 bit type...you would need 3 chars. For 36 you would need 5 char array and you would have 4 bits of wasted padding on the end. This could easily be accounted for.
i.e.
char typeSize[3] = {0}; // should hold 24 bits
Now make a bit mask to access each position of typeSize.
const unsigned char one = 0b0000'0001;
const unsigned char two = 0b0000'0010;
const unsigned char three = 0b0000'0100;
const unsigned char four = 0b0000'1000;
const unsigned char five = 0b0001'0000;
const unsigned char six = 0b0010'0000;
const unsigned char seven = 0b0100'0000;
const unsigned char eight = 0b1000'0000;
Now you can use the bit-wise or to set the values to 1 where needed..
typeSize[1] |= four;
*typeSize[0] |= (four | five);
To turn off bits use the & operator..
typeSize[0] &= ~four;
typeSize[2] &= ~(four| five);
You can read the position of each bit with the & operator.
typeSize[0] & four
Bear in mind, I don't have a compiler handy to try this out so hopefully this is a useful approach to your problem.
Good luck ;-)
You can use array of unsigned long int and store and retrieve needed bit chains with bitwise operations. This approach excludes space overhead.
Simplified example for unsigned byte array B[] and 12-bit variables V (represented as ushort):
Set V[0]:
B[0] = V & 0xFF; //low byte
B[1] = B[1] & 0xF0; // clear low nibble
B[1] = B[1] | (V >> 8); //fill low nibble of the second byte with the highest nibble of V

Retain sign and decimal after MSB to LSB swaping

I am getting data as hex values in 2 byte short values, but after swapping value got lost.
signed short value = 0x0040;
value = (value*0.5) - 40;
convertMSBTOLSB(value); //Conversion used bcz my device reading as LSB first
//Implementation of convertMSBTOLSB(value)
unsigned short temp = ((char*) &value)[0]; // assign value's LSB
temp = (temp << 8) | ((char*) &value)[1]; // shift LSB to MSB and add value's MSB
value = temp;
After conversion I got value as -8
Problem happened when I send 0x51, The final value should be 0.5 but getting zero because value is signed short.
convertMSBTOLSB is just byte swapping, how can I handle the code so that it can parse both -ve and decimal values
Expecting some input to change the code in such away that it can parse both -ve and decimal values
You won't get 0.5 because your value variable is declared short, and therefore can hold only integers.
Your question is unclear. you wrote that convertMSBTOLSB swaps between MSB and LSB, and you also wrote that
convertMSBTOLSB is just byte swapping
Since MSB and LSB reffering to Bits and not to Bytes I really dont understand what are you trying to swap here.
Your value should change its type. 0.5 and value are of incompatible types (integer and float). Therefore, the operation (value*0.5) will result in a zero.
Additional to this, 40 will be promoted to a double and therefore after assigning this back to value then the value will be truncated.

Hash algorithm for string of characters using XOR and bit shift

I was given this algorithm to write a hash function:
BEGIN Hash (string)
UNSIGNED INTEGER key = 0;
FOR_EACH character IN string
key = ((key << 5) + key) ^ character;
END FOR_EACH
RETURN key;
END Hash
The <<operator refers to shift bits to the left. The ^ refers to the XOR operation and the character refers to the ASCII value of the character. Seems pretty straightforward.
Below is my code
unsigned int key = 0;
for (int i = 0; i < data.length(); i++) {
key = ((key<<5) + key) ^ (int)data[i];
}
return key;
However, I keep getting ridiculous positive and negative huge numbers when i should actually get a hash value from 0 - n. n is a value set by the user beforehand. I'm not sure where things went wrong but I'm thinking it could be the XOR operation.
Any suggestions or opinions will be greatly appreciated. Thanks!
The output of this code is a 32-bit (or 64-bit or however wide your unsigned int is) unsigned integer. To restrict it to the range from 0 to n−1, simply reduce it modulo n, using the % operator:
unsigned int hash = key % n;
(It should be obvious that your code, as written, cannot return "a hash value from 0 - n", since n does not appear anywhere in your code.)
In fact, there's a good reason not to reduce the hash value modulo n too soon: if you ever need to grow your hash, storing the unreduced hash codes of your strings saves you the effort of recalculating them whenever n changes.
Finally, a few general notes on your hash function:
As Joachim Pileborg comments above, the explicit (int) cast is unnecessary. If you want to keep it for clarity, it really should say (unsigned int) to match the type of key, since that's what the value actually gets converted into.
For unsigned integer types, ((key<<5) + key) is equal to 33 * key (since shifting left by 5 bits is the same as multiplying by 25 = 32). On modern CPUs, using multiplication is almost certainly faster; on old or very low-end processors with slow multiplication, it's likely that any decent compiler will optimize multiplication by a constant into a combination of shifts and adds anyway. Thus, either way, expressing the operation as a multiplication is IMO preferable.
You don't want to call data.length() on every iteration of the loop. Call it once before the loop and store the result in a variable.
Initializing key to zero means that your hash value is not affected by any leading zero bytes in the string. The original version of your hash function, due to Dan Bernstein, uses a (more or less random) initial value of 5381 instead.

Set A Float's Fractional Part Using 6 Bits

I am uncompressing some data from double words.
unsigned char * current_word = [address of most significant byte]
My first 14 MSB are an int value. I plan to extract them using a bitwise AND with 0xFFFC.
int value = (int)( (uint_16)current_word & 0xFFFC );
My next 6 bits are a fractional value. Here I am stuck on an efficient implementation. I could extract one bit at a time, and build the fraction 1/2*bit + 1/4+bit + 1/8*bit etc ... but that's not efficient.
float fractional = ?
The last 12 LSB are another int value, which I feel I can pull out using bitwise AND again.
int other_value = (int) ( (uint_16)current_word[2] & 0x0FFF );
This operation will be done on 16348 double words and needs to be finished within 0.05 ms to run at least 20Hz.
I am very new to bit operations, but I'm excited to learn. Reading material and/or examples would be greatly appreciated!
Edit: I wrote OR when I meant AND
Since you're starting with [address of most significant byte] and using increasing addresses from there, your data is apparently in Big-Endian byte order. Casting pointers will therefore fail on nearly all desktop machines, which use Little-Endian byte order.
The following code will work, regardless of native byte order:
int value = (current_word[0] << 6) | (current_word[1] >> 2);
double fractional = (current_word[1] & 0x03) / 4.0 + (current_word[2] & 0xF0) / 1024.0;
int other_value = (current_word[2] & 0x0F) << 8 | current_word[3];
Firstly you'd be more efficient getting the double-word all at once into an int and masking/shifting from there.
Getting the fractional part from that is easy: mask and shift to get an integer, then divide by a float to scale the result.
float fractional = ((current_int >> 12) & 0x3f) / 64.;
there are 5 kinds of shift instructions:
Shift right with sign extend: It will copy your current leftmost bit as the new bit to the leftmost after shifting all the bits to the right. Rightmost one gets dropped.
Shift right with zero extend: Same as (1) but assume that your new leftmost bit is always zero.
Shift left: replace right in (1) and (2) with left , left with right and read (2) again.
Roll right: Shift your bits to the right, instead of rightmost one dropping, it becomes your leftmost.
Roll left: Replace right in (4) with left , left with right and read (4) again.
You can shift as many times you want. In C, more than the amount of bits in your datatype is undefined. Unsigned and signed types shift differently although the syntax is same.
If you are reading your data as unsigned char *, you are not going to be able to get more than 8-bits at a time of data and your example needs to change. If your address is aligned, or your platform allows, you should read your data in as an int *, but then that also begs the question of just how your data is stored. Is it stored 20-bits per integer with 12-bits of other info, or is it a 20-bit stream where you need to keep track of your bit pointer. If the second, it's even more complex than you realize. I'll post further once I have a feel for how your data is laid out in RAM.

Bit Operators to append two unsigned char in C++

If I have two things which are hex, can I someone how append their binary together to get a value?
In C++,
say I have
unsigned char t = 0xc2; // 11000010
unsigned char q = 0xa3; // 10100011
What I want is somehow,
1100001010100011, is this possible using bit-wise operators?
I want to extract the binary form of t and q and append them...
Yes it's possible.
Just use the left-bitshift operator, shifting to the left by 8, using at least a 16-bit integer. Then binary OR the 2nd value to the integer.
unsigned char t = 0xc2; // 11000010
unsigned char q = 0xa3; // 10100011
unsigned short s = (((unsigned short)t)<<8) | q; //// 11000010 10100011
Alternatively putting both values in a union containing 2 chars (careful of big endian or small) would have the same bit level result. Another option is a char[2].
Concatenating two chars:
unsigned char t = 0xc2; // 11000010
unsigned char q = 0xa3; // 10100011
int result = t; // Put into object that can hold the fully concatenated data;
result <<= 8; // Shift it left
result |= q; // Or the bottom bits into place;
Your example doesn't really work too well because the width (usually 8-bits) of the input values aren't defined. For example, why isn't your example: 0000000100000010, which would be truly appending 1 (00000001) and 2 (00000010) bit wise.
If each value does have a fixed width then it can be answered with bit shifting and ORing values
EDIT: if your "width" is defined the full width with all leading zero's removed, then it is possible to do with shifting and ORing, but more complicated.
I'd go with the char array.
unsigned short s;
char * sPtr = &s;
sPtr[0] = t; sPtr[1] = q;
This doesn't really care about endian..
I'm not sure why you'd want to do this but this would work.
The problem with the bit methods are that you're not sure what size you've got.
If you know the size.. I'd go with Brians answer
There is no append in binary/hex because you are dealing with Numbers (can you append 1 and 2 and not confuse the resulting 12 with the "real" 12?)
You could delimit them with some special symbol, but you can't just "concatenate" them.
Appending as an operation doesn't really make sense for numbers, regardless of what base they're in. Using . as the concatenation operator: in your example, 0x1 . 0x2 becomes 0x12 if you concat the hex, and 0b101 if you concat the binary. But 0x12 and 0b101 aren't the same value (in base 10, they're 18 and 5 respectively). In general, A O B (where A and B are numbers and O is an operator) should result in the same value no matter what base you're operating in.